diff --git a/bip-0093.mediawiki b/bip-0093.mediawiki
index 22a7ba32e9..1258332c82 100644
--- a/bip-0093.mediawiki
+++ b/bip-0093.mediawiki
@@ -16,11 +16,11 @@
===Abstract===
-This document describes a standard for backing up and restoring the master seed of a
-[https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki BIP-0032] hierarchical deterministic wallet, using Shamir's secret sharing.
-It includes an encoding format, a BCH error-correcting checksum, and algorithms for share generation and secret recovery.
-Secret data can be split into up to 31 shares.
-A minimum threshold of shares, which can be between 1 and 9, is needed to recover the secret, whereas without sufficient shares, no information about the secret is recoverable.
+This document proposes a checksummed base32 format, "codex32", and a standard for backing up and restoring the master seed of a
+[https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki BIP-0032] hierarchical deterministic wallet using it.
+It includes an encoding format, a BCH error-correcting checksum, and optional Shamir's secret sharing algorithms for share generation and secret recovery.
+Secret data can be encoded directly, or split into up to 31 shares.
+A minimum threshold of shares, which can be between 2 and 9, is needed to recover the secret, whereas without sufficient shares, no information about the secret is recoverable.
===Copyright===
@@ -59,6 +59,10 @@ However, BIP-0039 has no error-correcting ability, cannot sensibly be extended t
==Specification==
+We first describe the general checksummed base32'''Why use base32 at all?''' The lack of mixed case makes it more
+efficient to read out loud, write, type or to put into QR codes. format called
+''codex32'' and then define a BIP-0032 master seed encoding using it.
+
===codex32===
A codex32 string is similar to a bech32 string defined in [https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173].
@@ -75,10 +79,12 @@ It reuses the base-32 character set from BIP-0173, and consists of:
** A checksum which consists of 13 bech32 characters as described below.
As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase.
+Note that per BIP-0173, the lowercase form is used when determining a character's value for checksum purposes.
+In particular, given an all uppercase codex32 string, we still use lowercase ms as the human-readable part during checksum construction.
For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings.
If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.
-===Checksum===
+====Checksum====
The last thirteen characters of the data part form a checksum and contain no information.
Valid strings MUST pass the criteria for validity specified by the Python 3 code snippet below.
@@ -119,8 +125,12 @@ def ms32_create_checksum(data):
polymod = ms32_polymod(values + [0] * 13) ^ MS32_CONST
return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]
+This implements a [https://en.wikipedia.org/wiki/BCH_code BCH code] that
+guarantees detection of '''any error affecting at most 8 characters'''
+and has less than a 3 in 1020 chance of failing to detect more
+random errors.
-===Error Correction===
+====Error Correction====
A codex32 string without a valid checksum MUST NOT be used.
The checksum is designed to be an error correcting code that can correct up to 4 character substitutions, up to 8 unreadable characters (called erasures), or up to 13 consecutive erasures.
@@ -137,9 +147,8 @@ We do not specify how an implementation should implement error correction. Howev
===Unshared Secret===
When the share index of a valid codex32 string (converted to lowercase) is the letter "s", we call the string a codex32 secret.
-The payload in a codex32 secret is a direct encoding of a BIP-0032 HD master seed.
-The master seed is decoded by converting the payload to bytes:
+The secret is decoded by converting the payload to bytes:
* Translate the characters to 5 bits values using the bech32 character table from BIP-0173, most significant bit first.
* Re-arrange those bits into groups of 8 bits. Any incomplete group at the end MUST be 4 bits or less, and is discarded.
@@ -148,23 +157,69 @@ Note that unlike the decoding process in BIP-0173, we do NOT require that the in
For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid).
We recommend using the digit "0" for the threshold parameter in this case.
-The 4 character identifier also has no effect beyond aiding users in distinguishing between multiple different master seeds in cases where they have more than one.
+The 4 character identifier also has no effect beyond aiding users in distinguishing between multiple different secrets in cases where they have more than one.
-===Recovering Master Seed===
+The function ms32_encode constructs a codex32 string when its argument is the converted data-part characters (excluding the checksum).
-When the share index of a valid codex32 string (converted to lowercase) is not the letter "s", we call the string an codex32 share.
+To validate a codex32 string and determine the data-part (excluding the checksum) as a list of 5-bit values, the ms32_decode function can be used.
+
+
+CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
+
+def ms32_encode(data):
+ combined = data + ms32_create_checksum(data)
+ return "ms" + "1" + ''.join([CHARSET[d] for d in combined])
+
+def ms32_decode(codex):
+ if ((any(ord(x) < 33 or ord(x) > 126 for x in codex)) or
+ (codex.lower() != codex and codex.upper() != codex)):
+ return None
+ codex = codex.lower()
+ pos = codex.rfind("1")
+ if pos < 2 or not (48 <= len(codex) <= 127):
+ return None
+ if not all(x in CHARSET for x in codex[pos+1:]):
+ return None
+ if codex[:pos] != "ms" or codex[pos+1].isalpha() or codex[pos+1] == "0" and codex[pos+6] != "s":
+ return None
+ data = [CHARSET.index(x) for x in codex[pos+1:]]
+ if not ms32_verify_checksum(data):
+ return None
+ return data[:-13 if len(data) < 94 else -15] # See Long codex32 Strings
+
+
+===Master seed format===
+
+When the human-readable part of a valid codex32 secret (converted to lowercase) is the string "ms", we call it a codex32-encoded master seed or secret seed. The payload in this case is a direct encoding of a BIP-0032 HD master seed.
+
+A secret seed is a codex32 encoding of:
+
+* The human-readable part "ms" for master seed.
+* The data-part values:
+** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
+** An identifier consisting of 4 bech32 characters.
+*** We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed and share set the user may need to disambiguate.
+** The share index "s".
+** A conversion of the 16-to-64-byte BIP-0032 HD master seed to bech32:
+*** Start with the bits of the master seed, most significant bit per byte first.
+*** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed.
+*** Translate those bits to characters using the bech32 character table from BIP-0173.
+** A valid checksum in accordance with the Checksum section.
+
+===Recovering Secret===
+
+When the share index of a valid codex32 string (converted to lowercase) is not the letter "s", we call the string a codex32 share.
The first character of the data part indicates the threshold of the share, and it is required to be a non-"0" digit.
-In order to recover a master seed, one needs a set of valid codex32 shares such that:
+In order to recover a secret, one needs a set of valid shares such that:
* All shares have the same threshold value, the same identifier, and the same length.
* All of the share index values are distinct.
-* The number of codex32 shares is exactly equal to the (common) threshold value.
-
-If all the above conditions are satisfied, the ms32_recover function will return a codex32 secret when its argument is the list of codex32 shares with each share represented as a list of integers representing the characters converted using the bech32 character table from BIP-0173.
+* The number of shares is exactly equal to the (common) threshold value.
+If all the above conditions are satisfied, the ms32_recover function will return a codex32 secret when its argument is the list of codex32 shares with each share represented as a list of integers representing the characters converted using the bech32 character table from BIP-0173.
-bech32_inv = [
+BECH32_INV = [
0, 1, 20, 24, 10, 8, 12, 29, 5, 11, 4, 9, 6, 28, 26, 31,
22, 18, 17, 23, 2, 25, 16, 19, 3, 21, 14, 30, 13, 7, 27, 15,
]
@@ -186,7 +241,7 @@ def bech32_lagrange(l, x):
for j in l:
m = bech32_mul(m, (x if i == j else i) ^ j)
c.append(m)
- return [bech32_mul(n, bech32_inv[i]) for i in c]
+ return [bech32_mul(n, BECH32_INV[i]) for i in c]
def ms32_interpolate(l, x):
w = bech32_lagrange([s[5] for s in l], x)
@@ -198,66 +253,66 @@ def ms32_interpolate(l, x):
res.append(n)
return res
-def ms32_recover(l):
- return ms32_interpolate(l, 16)
+def ms32_recover(shares):
+ return ms32_interpolate(shares, 16)
===Generating Shares===
-If we already have ''t'' valid codex32 strings such that:
+If we already have ''k'' valid codex32 strings such that:
-* All strings have the same threshold value ''t'', the same identifier, and the same length
+* All strings have the same threshold value ''k'', the same identifier, and the same length
* All of the share index values are distinct
-Then we can derive additional shares with the ms32_interpolate function by passing it a list of exactly ''t'' of these codex32 strings, together with a fresh share index distinct from all of the existing share indexes.
+Then we can derive additional shares with the ms32_interpolate function by passing it a list of exactly ''k'' of these codex32 strings, together with a fresh share index distinct from all of the existing share indexes.
The newly derived share will have the provided share index.
-Once a user has generated ''n'' codex32 shares, they may discard the codex32 secret (if it exists).
-The ''n'' shares form a ''t'' of ''n'' Shamir's secret sharing scheme of a codex32 secret.
+Once a user has generated ''n'' shares, they may discard the codex32 secret (if it exists).
+The ''n'' shares form a ''k'' of ''n'' Shamir's secret sharing scheme of a codex32 secret.
-There are two ways to create an initial set of ''t'' valid codex32 strings, depending on whether the user already has an existing master seed to split.
+There are two ways to create an initial set of ''k'' valid codex32 strings, depending on whether the user already has an existing secret to split.
-====For a fresh master seed====
+====For a fresh secret====
-In the case that the user wishes to generate a fresh master seed, the user generates random initial shares, as follows:
+In the case that the user wishes to generate a fresh secret, the user generates random initial shares, as follows:
-# Choose a bitsize, between 128 and 512, which must be a multiple of 8.
-# Choose a threshold value ''t'' between 2 and 9, inclusive
+# Choose a bitsize, between 128 and 512, which must be a multiple of 8
+# Choose a threshold value ''k'' between 2 and 9, inclusive
# Choose a 4 bech32 character identifier
-#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate.
-# ''t'' many times, generate a random share by:
+#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every secret the user may need to disambiguate
+# ''k'' many times, generate a random share by:
## Take the next available letter from the bech32 alphabet, in alphabetical order, as a, c, d, ..., to be the share index
## Set the first nine characters to be the prefix ms1, the threshold value ''t'', the 4-character identifier, and then the share index
## Choose the next ceil(''bitlength / 5'') characters uniformly at random
## Generate a valid checksum in accordance with the Checksum section, and append this to the resulting shares
-The result will be ''t'' distinct shares, all with the same initial 8 characters, and a distinct share index as the 9th character.
+The result will be ''k'' distinct shares, all with the same initial 8 characters, and a distinct share index as the 9th character.
-With this set of ''t'' codex32 shares, new shares can be derived as discussed above. This process generates a fresh master seed, whose value can be retrieved by running the recovery process on any ''t'' of these shares.
+With this set of ''k'' shares, new shares can be derived as discussed above. This process generates a fresh secret, whose value can be retrieved by running the recovery process on any ''k'' of these shares.
-====For an existing master seed====
+====For an existing secret====
-Before generating shares for an existing master seed, it first must be converted into a codex32 secret, as described above.
+Before generating shares for an existing secret, it first must be codex32-encoded.
The conversion process consists of:
-# Choose a threshold value ''t'' between 2 and 9, inclusive
+# Choose a threshold value ''k'' between 2 and 9, inclusive
# Choose a 4 bech32 character identifier
-#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate.
+#* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every set of shares the user may need to disambiguate
# Set the share index to s
-# Set the payload to a bech32 encoding of the master seed, padded with arbitrary bits
-# Generating a valid checksum in accordance with the Checksum section
+# Set the payload to a bech32 encoding of the secret data, padded with arbitrary bits
+# Generate a valid checksum in accordance with the Checksum section
-Along with the codex32 secret, the user must generate ''t''-1 other codex32 shares, each with the same threshold value, the same identifier, and a distinct share index.
-These shares should be generated as described in the "fresh master seed" section.
+Along with the codex32 secret, the user must generate ''k''-1 other codex32 shares, each with the same threshold value, the same identifier, and a distinct share index.
+These shares should be generated as described in the "fresh secret" section.
-The codex32 secret and the ''t''-1 codex32 shares form a set of ''t'' valid codex32 strings from which additional shares can be derived as described above.
+The codex32 secret and the ''k''-1 codex32 shares form a set of ''k'' valid initial codex32 strings from which additional shares can be derived as described above.
-===Long codex32 Strings===
+===Long codex32===
The 13 character checksum design only supports up to 80 data characters.
Excluding the threshold, identifier and index characters, this limits the payload to 74 characters or 46 bytes.
While this is enough to support the 32-byte advised size of BIP-0032 master seeds, BIP-0032 allows seeds to be up to 64 bytes in size.
-We define a long codex32 string format to support these longer seeds by defining an alternative checksum.
+We define a long codex32 format to support these longer seeds by defining an alternative checksum.
MS32_LONG_CONST = 0x43381e570bf4798ab26
@@ -286,15 +341,19 @@ def ms32_create_long_checksum(data):
polymod = ms32_long_polymod(values + [0] * 15) ^ MS32_LONG_CONST
return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)]
+This implements a [https://en.wikipedia.org/wiki/BCH_code BCH code] that
+guarantees detection of '''any error affecting at most 8 characters'''
+and has less than a 3 in 1023 chance of failing to detect more
+random errors.
A long codex32 string follows the same specification as a regular codex32 string with the following changes.
* The payload is a sequence of between 75 and 103 bech32 characters.
* The checksum consists of 15 bech32 characters as defined above.
-A codex32 string with a data part of 94 or 95 characters is never legal as a regular codex32 string is limited to 93 data characters and a long codex32 string is at least 96 characters.
+A codex32 string with a data part of 94 or 95 characters is never legal as a regular codex32 string is limited to 93 data characters and a long codex32 string is at least 96 data characters.
-Generation of long shares and recovery of the master seed from long shares proceeds in exactly the same way as for regular shares with the ms32_interpolate function.
+Generation of long shares and recovery of the long secret from long shares proceeds in exactly the same way as for regular shares with the ms32_interpolate function.
The long checksum is designed to be an error correcting code that can correct up to 4 character substitutions, up to 8 unreadable characters (called erasures), or up to 15 consecutive erasures.
As with regular checksums we do not specify how an implementation should implement error correction, and all our recommendations for error correction of regular codex32 strings also apply to long codex32 strings.
@@ -307,7 +366,7 @@ This means that derived shares will always have valid checksum, and a sufficient
The header system is also compatible with Lagrange interpolation, meaning all derived shares will have the same identifier and will have the appropriate share index.
This fact allows the header data to be covered by the checksum.
-The checksum size and identifier size have been chosen so that the encoding of 128-bit seeds and shares fit within 48 characters.
+The checksum size and identifier size have been chosen so that the encoding of 128-bit master seeds and shares fit within 48 characters.
This is a standard size for many common seed storage formats, which has been popularized by the 12 four-letter word format of the BIP-0039 mnemonic.
The 13 character checksum is adequate to correct 4 errors in up to 93 characters (80 characters of data and 13 characters of the checksum).
@@ -323,7 +382,7 @@ While we could use the 15 character checksum for both cases, we prefer to keep t
We only guarantee to correct 4 characters no matter how long the string is.
Longer strings mean more chances for transcription errors, so shorter strings are better.
-The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 400-bit secret.
+The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 368-bit secret.
At this length, the prefix MS1 is not covered by the checksum.
This is acceptable because the checksum scheme itself requires you to know that the MS1 prefix is being used in the first place.
If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected MS1 prefix.
@@ -389,7 +448,7 @@ The payload contains 26 bech32 characters, which corresponds to 130 bits. We tru
codex32 secret (bech32): ms10testsxxxxxxxxxxxxxxxxxxxxxxxxxx4nzvca9cmczlw
-Master secret (hex): 318c6318c6318c6318c6318c6318c631
+Master seed (hex): 318c6318c6318c6318c6318c6318c631
* human-readable part: ms
* separator: 1
@@ -402,7 +461,7 @@ Master secret (hex): 318c6318c6318c6318c6318c6318c631
===Test vector 2===
-This example shows generating a new master seed using "random" codex32 shares, as well as deriving an additional codex32 share, using ''k''=2 and an identifier of NAME.
+This example shows generating a new master seed using "random" shares, as well as deriving an additional share, using ''k''=2 and an identifier of NAME.
Although codex32 strings are canonically all lowercase, it's also valid to use all uppercase.
Share with index A: MS12NAMEA320ZYXWVUTSRQPNMLKJHGFEDCAXRPP870HKKQRM
@@ -410,21 +469,18 @@ Share with index A: MS12NAMEA320ZYXWVUTSRQPNMLKJHGFEDCAXRPP87
Share with index C: MS12NAMECACDEFGHJKLMNPQRSTUVWXYZ023FTR2GDZMPY6PN
* Derived share with index D: MS12NAMEDLL4F8JLH4E5VDVULDLFXU2JHDNLSM97XVENRXEG
-* Secret share with index S: MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW
-* Master secret (hex): d1808e096b35b209ca12132b264662a5
+* Recovered secret seed with index S: MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW
+* Master seed (hex): d1808e096b35b209ca12132b264662a5
* master node xprv: xprv9s21ZrQH143K2NkobdHxXeyFDqE44nJYvzLFtsriatJNWMNKznGoGgW5UMTL4fyWtajnMYb5gEc2CgaKhmsKeskoi9eTimpRv2N11THhPTU
-Note that per BIP-0173, the lowercase form is used when determining a character's value for checksum purposes.
-In particular, given an all uppercase codex32 string, we still use lowercase ms as the human-readable part during checksum construction.
-
===Test vector 3===
-This example shows splitting an existing 128-bit master seed into "random" codex32 shares, using ''k''=3 and an identifier of cash.
+This example shows splitting an existing 128-bit master seed into "random" shares, using ''k''=3 and an identifier of cash.
We appended two zero bits in order to obtain 26 bech32 characters (130 bits of data) from the 128-bit master seed.
-Master secret (hex): ffeeddccbbaa99887766554433221100
+Master seed (hex): ffeeddccbbaa99887766554433221100
-Secret share with index s: ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln
+codex32-encoded master seed with index s: ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln
Share with index a: ms13casha320zyxwvutsrqpnmlkjhgfedca2a8d0zehn8a0t
@@ -437,7 +493,7 @@ Share with index c: ms13cashcacdefghjklmnpqrstuvwxyz023949xq3
Any three of the five shares among acdef can be used to recover the secret.
-Note that the choice to append two zero bits was arbitrary, and any of the following four secret shares would have been valid choices.
+Note that the choice to append two zero bits was arbitrary, and any of the following four codex32 secrets would have been valid choices.
However, each choice would have resulted in a different set of derived shares.
* ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln
@@ -450,7 +506,7 @@ However, each choice would have resulted in a different set of derived shares.
This example shows converting a 256-bit secret into a codex32 secret, without splitting the secret into any shares.
We appended four zero bits in order to obtain 52 bech32 characters (260 bits of data) from the 256-bit secret.
-256-bit secret (hex): ffeeddccbbaa99887766554433221100ffeeddccbbaa99887766554433221100
+Master seed (hex): ffeeddccbbaa99887766554433221100ffeeddccbbaa99887766554433221100
* codex32 secret: ms10leetsllhdmn9m42vcsamx24zrxgs3qrl7ahwvhw4fnzrhve25gvezzyqqtum9pgv99ycma
* master node xprv: xprv9s21ZrQH143K3s41UCWxXTsU4TRrhkpD1t21QJETan3hjo8DP5LFdFcB5eaFtV8x6Y9aZotQyP8KByUjgLTbXCUjfu2iosTbMv98g8EQoqr
@@ -476,13 +532,20 @@ Note that the choice to append four zero bits was arbitrary, and any of the foll
===Test vector 5===
-This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.
+This example shows generating a new 512-bit master seed using "random" bech32 characters and appending a checksum.
The payload contains 103 bech32 characters, which corresponds to 515 bits. The last three bits are discarded when converting to a 512-bit master seed.
-This is an example of a '''Long codex32 String'''.
+This is an example of a '''Long codex32''' string.
+
+k value (bech32): 0
+
+identifier (bech32): 0C8V
+
+payload (bech32): M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F
-* Secret share with index S: MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK
-* Master secret (hex): dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9
+* checksum: HPV80UNDVARHRAK
+* codex32 secret: MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK
+* Master seed (hex): dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9
* master node xprv: xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN
===Invalid test vectors===