diff --git a/bip-0093.mediawiki b/bip-0093.mediawiki index 1258332c82..7c5fab26a1 100644 --- a/bip-0093.mediawiki +++ b/bip-0093.mediawiki @@ -66,21 +66,20 @@ efficient to read out loud, write, type or to put into QR codes. format ca ===codex32=== A codex32 string is similar to a bech32 string defined in [https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173]. -It reuses the base-32 character set from BIP-0173, and consists of: - -* A human-readable part, which is the string "ms" (or "MS"). -* A separator, which is always "1". -* A data part which is in turn subdivided into: -** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0". +It reuses the base-32 character set from BIP-0173, is at most 94 characters long, and consists of: +* The '''human-readable part''', as specified in BIP-0173. +* The '''separator''', as specified in BIP-0173, which is always "1". +* The '''data part''', which is at least 19 bech32 characters long and is in turn subdivided into: +** The '''threshold parameter''', which MUST be a single digit between "2" and "9", or the digit "0". *** If the threshold parameter is "0" then the share index, defined below, MUST have a value of "s" (or "S"). -** An identifier consisting of 4 bech32 characters. -** A share index, which is any bech32 character. Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section "Unshared Secret"). -** A payload which is a sequence of up to 74 bech32 characters. (However, see '''Long codex32 Strings''' below for an exception to this limit.) -** A checksum which consists of 13 bech32 characters as described below. - -As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase. -Note that per BIP-0173, the lowercase form is used when determining a character's value for checksum purposes. -In particular, given an all uppercase codex32 string, we still use lowercase ms as the human-readable part during checksum construction. +** The '''identifier''', which consists of 4 bech32 characters. +** The '''share index''', which is any bech32 character. +***Note that a share index value of "s" (or "S") is special and denotes the unshared secret (see section "Unshared Secret"). +** The '''payload''', which is a sequence of 0 to 73 bech32 characters. (However, see '''Long codex32''' below for an exception to this limit.) +** The '''checksum''', which consists of 13 bech32 characters as described below. + +As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase. +The lowercase form of the human-readable part is used when determining a character's value for checksum purposes. For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings. If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly. @@ -88,14 +87,16 @@ If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, a The last thirteen characters of the data part form a checksum and contain no information. Valid strings MUST pass the criteria for validity specified by the Python 3 code snippet below. -The function ms32_verify_checksum must return true when its argument is the data part as a list of integers representing the characters converted using the bech32 character table from BIP-0173. +The function codex32_verify_checksum must return true when its arguments are: +* hrp: the human-readable part as a string +* data: the data part as a list of integers representing the characters converted using the bech32 character table from BIP-0173 -To construct a valid checksum given the data-part characters (excluding the checksum), the ms32_create_checksum function can be used. +To construct a valid checksum given the human-readable part and data-part characters (excluding the checksum), the codex32_create_checksum function can be used. -MS32_CONST = 0x10ce0795c2fd1e62a +CODEX32_CONST = 0x10ce0795c2fd1e62a -def ms32_polymod(values): +def codex32_polymod(values): GEN = [ 0x19dc500ce73fde210, 0x1bfae00def77fe529, @@ -103,7 +104,7 @@ def ms32_polymod(values): 0x1739640bdeee3fdad, 0x07729a039cfc75f5a, ] - residue = 0x23181b3 + residue = 1 for v in values: b = (residue >> 60) residue = (residue & 0x0fffffffffffffff) << 5 ^ v @@ -111,24 +112,31 @@ def ms32_polymod(values): residue ^= GEN[i] if ((b >> i) & 1) else 0 return residue -def ms32_verify_checksum(data): - if len(data) >= 96: # See Long codex32 Strings - return ms32_verify_long_checksum(data) - if len(data) <= 93: - return ms32_polymod(data) == MS32_CONST +def bech32_hrp_expand(hrp): + return [ord(x) >> 5 for x in hrp] + [0] + [ord(x) & 31 for x in hrp] + +def codex32_verify_checksum(hrp, data): + if len(hrp) + len(data) >= 96: # See Long codex32 Strings + return codex32_verify_long_checksum(bech32_hrp_expand(hrp) + data) + if len(hrp) + len(data) <= 93: + return codex32_polymod(bech32_hrp_expand(hrp) + data) == CODEX32_CONST return False -def ms32_create_checksum(data): - if len(data) > 80: # See Long codex32 Strings - return ms32_create_long_checksum(data) - values = data - polymod = ms32_polymod(values + [0] * 13) ^ MS32_CONST +def codex32_create_checksum(hrp, data): + values = bech32_hrp_expand(hrp) + data + if len(hrp) + len(data) > 80: # See Long codex32 Strings + return codex32_create_long_checksum(values) + polymod = codex32_polymod(values + [0] * 13) ^ CODEX32_CONST return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)] This implements a [https://en.wikipedia.org/wiki/BCH_code BCH code] that guarantees detection of '''any error affecting at most 8 characters''' and has less than a 3 in 1020 chance of failing to detect more -random errors. +random errors. The human-readable part is processed as per BIP-0173'''Why are the high bits of the human-readable part processed first?''' +This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the +human readable part only change the low 5 bits (like changing an alphabetical character into another), errors are restricted to the ''[low hrp] [data]'' +part, which is at most 93 (or 1023 in long codex32) characters, and thus all error detection properties (see appendix) remain applicable.. + ====Error Correction==== @@ -136,6 +144,7 @@ A codex32 string without a valid checksum MUST NOT be used. The checksum is designed to be an error correcting code that can correct up to 4 character substitutions, up to 8 unreadable characters (called erasures), or up to 13 consecutive erasures. Implementations SHOULD provide the user with a corrected valid codex32 string if possible. However, implementations SHOULD NOT automatically proceed with a corrected codex32 string without user confirmation of the corrected string, either by prompting the user, or returning a corrected string in an error message and allowing the user to repeat their action. +The HRP defines the checksum and SHOULD NOT be error-corrected, ''unless'' there is a separate specification describing how to do this. We do not specify how an implementation should implement error correction. However, we recommend that: * Implementations make suggestions to substitute non-bech32 characters with bech32 characters in some situations, such as replacing "B" with "8", "O" with "0", "I" with "l", etc. @@ -157,35 +166,38 @@ Note that unlike the decoding process in BIP-0173, we do NOT require that the in For an unshared secret, the threshold parameter (the first character of the data part) is ignored (beyond the fact it must be a digit for the codex32 string to be valid). We recommend using the digit "0" for the threshold parameter in this case. -The 4 character identifier also has no effect beyond aiding users in distinguishing between multiple different secrets in cases where they have more than one. +The 4 character identifier also has no effect beyond aiding users in distinguishing between multiple different secrets or share sets with the same prefix in cases where they have more than one. -The function ms32_encode constructs a codex32 string when its argument is the converted data-part characters (excluding the checksum). +The function codex32_encode constructs a codex32 string when its arguments are: +* hrp: the human-readable part as a string +* data: the data part (excluding the checksum) as a list of 5-bit values -To validate a codex32 string and determine the data-part (excluding the checksum) as a list of 5-bit values, the ms32_decode function can be used. +To validate a codex32 string, and determine the human-readable part and the data part (excluding the checksum) as a list of 5-bit values, the codex32_decode function can be used. CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l" -def ms32_encode(data): - combined = data + ms32_create_checksum(data) - return "ms" + "1" + ''.join([CHARSET[d] for d in combined]) +def codex32_encode(hrp, data): + combined = data + codex32_create_checksum(hrp, data) + return hrp + '1' + ''.join([CHARSET[d] for d in combined]) -def ms32_decode(codex): +def codex32_decode(codex): if ((any(ord(x) < 33 or ord(x) > 126 for x in codex)) or (codex.lower() != codex and codex.upper() != codex)): - return None + return None, None codex = codex.lower() - pos = codex.rfind("1") - if pos < 2 or not (48 <= len(codex) <= 127): - return None + pos = codex.rfind('1') + if pos < 1 or pos + 20 > len(codex) or len(codex) > 1024: + return None, None if not all(x in CHARSET for x in codex[pos+1:]): - return None - if codex[:pos] != "ms" or codex[pos+1].isalpha() or codex[pos+1] == "0" and codex[pos+6] != "s": - return None + return None, None + if codex[pos+1].isalpha() or codex[pos+1] == "0" and codex[pos+6] != "s": + return None, None + hrp = codex[:pos] data = [CHARSET.index(x) for x in codex[pos+1:]] - if not ms32_verify_checksum(data): - return None - return data[:-13 if len(data) < 94 else -15] # See Long codex32 Strings + if not codex32_verify_checksum(hrp, data): + return None, None + return hrp, data[:-13 if len(codex) < 95 else -15] # See Long codex32 Strings ===Master seed format=== @@ -213,11 +225,12 @@ The first character of the data part indicates the threshold of the share, and i In order to recover a secret, one needs a set of valid shares such that: -* All shares have the same threshold value, the same identifier, and the same length. +* All shares have the same human-readable part, the same threshold value, the same identifier, and the same length. * All of the share index values are distinct. * The number of shares is exactly equal to the (common) threshold value. -If all the above conditions are satisfied, the ms32_recover function will return a codex32 secret when its argument is the list of codex32 shares with each share represented as a list of integers representing the characters converted using the bech32 character table from BIP-0173. +If all the above conditions are satisfied, the codex32_recover function will return a codex32 secret when its argument is the list of shares with each share represented as a list of integers representing the data characters converted using the bech32 character table from BIP-0173. + BECH32_INV = [ 0, 1, 20, 24, 10, 8, 12, 29, 5, 11, 4, 9, 6, 28, 26, 31, @@ -243,7 +256,7 @@ def bech32_lagrange(l, x): c.append(m) return [bech32_mul(n, BECH32_INV[i]) for i in c] -def ms32_interpolate(l, x): +def codex32_interpolate(l, x): w = bech32_lagrange([s[5] for s in l], x) res = [] for i in range(len(l[0])): @@ -253,18 +266,18 @@ def ms32_interpolate(l, x): res.append(n) return res -def ms32_recover(shares): - return ms32_interpolate(shares, 16) +def codex32_recover(shares): + return codex32_interpolate(shares, 16) ===Generating Shares=== If we already have ''k'' valid codex32 strings such that: -* All strings have the same threshold value ''k'', the same identifier, and the same length +* All strings have the same human-readable part, the same threshold value ''k'', the same identifier, and the same length * All of the share index values are distinct -Then we can derive additional shares with the ms32_interpolate function by passing it a list of exactly ''k'' of these codex32 strings, together with a fresh share index distinct from all of the existing share indexes. +Then we can derive additional shares with the codex32_interpolate function by passing it a list of exactly ''k'' of these codex32 strings, together with a fresh share index distinct from all of the existing share indexes. The newly derived share will have the provided share index. Once a user has generated ''n'' shares, they may discard the codex32 secret (if it exists). @@ -277,16 +290,17 @@ There are two ways to create an initial set of ''k'' valid codex32 strings, depe In the case that the user wishes to generate a fresh secret, the user generates random initial shares, as follows: # Choose a bitsize, between 128 and 512, which must be a multiple of 8 +# Choose a human-readable part according to application (Use "ms" for BIP-0032 master seeds) # Choose a threshold value ''k'' between 2 and 9, inclusive # Choose a 4 bech32 character identifier #* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every secret the user may need to disambiguate # ''k'' many times, generate a random share by: ## Take the next available letter from the bech32 alphabet, in alphabetical order, as a, c, d, ..., to be the share index -## Set the first nine characters to be the prefix ms1, the threshold value ''t'', the 4-character identifier, and then the share index +## Set the first characters to be the human-readable part, the separator 1, the threshold value ''k'', the 4-character identifier, and then the share index ## Choose the next ceil(''bitlength / 5'') characters uniformly at random ## Generate a valid checksum in accordance with the Checksum section, and append this to the resulting shares -The result will be ''k'' distinct shares, all with the same initial 8 characters, and a distinct share index as the 9th character. +The result will be ''k'' distinct shares, all with the same initial characters, and a distinct share index as the 6th data character. With this set of ''k'' shares, new shares can be derived as discussed above. This process generates a fresh secret, whose value can be retrieved by running the recovery process on any ''k'' of these shares. @@ -295,6 +309,7 @@ With this set of ''k'' shares, new shares can be derived as discussed above. Thi Before generating shares for an existing secret, it first must be codex32-encoded. The conversion process consists of: +# Choose a human-readable part according to application (Use "ms" for BIP-0032 master seeds) # Choose a threshold value ''k'' between 2 and 9, inclusive # Choose a 4 bech32 character identifier #* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every set of shares the user may need to disambiguate @@ -302,7 +317,7 @@ The conversion process consists of: # Set the payload to a bech32 encoding of the secret data, padded with arbitrary bits # Generate a valid checksum in accordance with the Checksum section -Along with the codex32 secret, the user must generate ''k''-1 other codex32 shares, each with the same threshold value, the same identifier, and a distinct share index. +Along with the codex32 secret, the user must generate ''k''-1 other codex32 shares, each with the same human-readable part, the same threshold value, the same identifier, and a distinct share index. These shares should be generated as described in the "fresh secret" section. The codex32 secret and the ''k''-1 codex32 shares form a set of ''k'' valid initial codex32 strings from which additional shares can be derived as described above. @@ -310,14 +325,14 @@ The codex32 secret and the ''k''-1 codex32 shares form a set of ''k'' valid init ===Long codex32=== The 13 character checksum design only supports up to 80 data characters. -Excluding the threshold, identifier and index characters, this limits the payload to 74 characters or 46 bytes. +Excluding the human-readable part, threshold, identifier and index characters, this limits the payload to 74 characters or 46 bytes. While this is enough to support the 32-byte advised size of BIP-0032 master seeds, BIP-0032 allows seeds to be up to 64 bytes in size. We define a long codex32 format to support these longer seeds by defining an alternative checksum. -MS32_LONG_CONST = 0x43381e570bf4798ab26 +CODEX32_LONG_CONST = 0x43381e570bf4798ab26 -def ms32_long_polymod(values): +def codex32_long_polymod(values): GEN = [ 0x3d59d273535ea62d897, 0x7a9becb6361c6c51507, @@ -325,7 +340,7 @@ def ms32_long_polymod(values): 0x0c577eaeccf1990d13c, 0x1887f74f8dc71b10651, ] - residue = 0x23181b3 + residue = 1 for v in values: b = (residue >> 70) residue = (residue & 0x3fffffffffffffffff) << 5 ^ v @@ -333,12 +348,12 @@ def ms32_long_polymod(values): residue ^= GEN[i] if ((b >> i) & 1) else 0 return residue -def ms32_verify_long_checksum(data): - return ms32_long_polymod(data) == MS32_LONG_CONST +def codex32_verify_long_checksum(data): + return codex32_long_polymod(data) == CODEX32_LONG_CONST -def ms32_create_long_checksum(data): +def codex32_create_long_checksum(data): values = data - polymod = ms32_long_polymod(values + [0] * 15) ^ MS32_LONG_CONST + polymod = codex32_long_polymod(values + [0] * 15) ^ CODEX32_LONG_CONST return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)] This implements a [https://en.wikipedia.org/wiki/BCH_code BCH code] that @@ -353,7 +368,7 @@ A long codex32 string follows the same specification as a regular codex32 string A codex32 string with a data part of 94 or 95 characters is never legal as a regular codex32 string is limited to 93 data characters and a long codex32 string is at least 96 data characters. -Generation of long shares and recovery of the long secret from long shares proceeds in exactly the same way as for regular shares with the ms32_interpolate function. +Generation of long shares and recovery of the long secret from long shares proceeds in exactly the same way as for regular shares with the codex32_interpolate function. The long checksum is designed to be an error correcting code that can correct up to 4 character substitutions, up to 8 unreadable characters (called erasures), or up to 15 consecutive erasures. As with regular checksums we do not specify how an implementation should implement error correction, and all our recommendations for error correction of regular codex32 strings also apply to long codex32 strings. @@ -382,10 +397,10 @@ While we could use the 15 character checksum for both cases, we prefer to keep t We only guarantee to correct 4 characters no matter how long the string is. Longer strings mean more chances for transcription errors, so shorter strings are better. -The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 368-bit secret. -At this length, the prefix MS1 is not covered by the checksum. -This is acceptable because the checksum scheme itself requires you to know that the MS1 prefix is being used in the first place. -If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected MS1 prefix. +The longest data part using the regular 13 character checksum is 91 characters and corresponds to 360-bit secret data. +At this length, the upper bits of the human-readable part are not covered by the checksum. +This is acceptable because the checksum scheme itself requires you to know that a codex32 human-readable part is being used in the first place. +If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected codex32 prefix. ===Not BIP-0039 Entropy=== @@ -439,6 +454,15 @@ Instead, users who wish to switch to codex32 should generate a fresh seed and sw Our [https://github.com/BlockstreamResearch/codex32 reference implementation repository] contains implementations in Rust and PostScript. The inline code in this BIP text can be used as a Python reference. +==Registered Human-readable Prefixes== + +SatoshiLabs maintains a full list of registered human-readable parts for other uses of codex32: + +[https://github.com/satoshilabs/slips/blob/master/slip-0173.md#uses-of-codex32 SLIP-0173 : Registered human-readable parts for BIP-0093] + +The sequence of lower 5 bits of each character's US-ASCII value in a registered codex32 human-readable part SHOULD be unique. +This makes codex32 HRP error correction possible for applications choosing to implement it. + ==Test Vectors== ===Test vector 1=== @@ -548,6 +572,41 @@ payload (bech32): M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6 * Master seed (hex): dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9 * master node xprv: xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN +===Test vector 6=== + +This example shows converting an existing 256-bit Core Lightning HSM secret into a codex32 secret using a human-readable part of cl and an identifier of luea and then relabeling the secret. Four zero bits are appended in order to obtain 52 bech32 payload characters (260 bits of data) from the 256-bit HSM secret. + +Core Lightning HSM secret (hex): 83634b3b43a3734b73396989980000000000000000000000000000000000000000 + +* payload: d35kw6r5de5kueedxyesqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq +* checksum: anvrktzhlhusz +* codex32-encoded HSM secret: cl10lueasd35kw6r5de5kueedxyesqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqanvrktzhlhusz + +Note the identifier choice is arbitrary, any identifier would have been valid; however a different identifier produces a different checksum. For example: + +* identifier: cln2 +* checksum: n9lcvcu7cez4s +* codex32-encoded HSM secret: cl10cln2sd35kw6r5de5kueedxyesqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqn9lcvcu7cez4s + +===Test vector 7=== + +This example shows the codex32 format, when used with a different human-readable part. +The payload contains 52 bech32 characters, which corresponds to 260 bits. We truncate the last four bits in order to obtain a 256-bit HSM secret. + +codex32-encoded HSM secret (bech32): cl10peevst6cqh0wu7p5ssjyf4z4ez42ks9jlt3zneju9uuypr2hddak6tlqsjhsks4laxts8q + +* human-readable part: cl +* separator: 1 +* k value: 0 (no secret splitting) +* identifier: peev +* share index: s (the secret) +* payload: t6cqh0wu7p5ssjyf4z4ez42ks9jlt3zneju9uuypr2hddak6tlqs +* checksum: jhsks4laxts8q +* HSM secret (hex): 82f5805deee7834842444d455c8aaab40b2fae229e65c2f38408d576b7b6d2fe08 + + + + ===Invalid test vectors=== These examples have incorrect checksums. @@ -614,7 +673,7 @@ This example has a threshold that is not a digit. * ms1fauxxxxxxxxxxxxxxxxxxxxxxxxxxxxxda3kr3s0s2swg -These examples do not begin with the required "ms" or "MS" prefix and/or are missing the "1" separator. +These examples do not begin with the "ms" or "MS" prefix required for their checksum to validate and/or are missing the "1" separator. * 0fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxuqxkk05lyf3x2 * 10fauxsxxxxxxxxxxxxxxxxxxxxxxxxxxuqxkk05lyf3x2