Skip to content

Conversation

@BenWestgate
Copy link
Contributor

@BenWestgate BenWestgate commented Dec 10, 2025

Various terminology fixes, typo fixes and phrase fixups from #2040 and #2023.

@apoelstra said:

In the interest of moving forward I would kinda like Ben to make a new PR with the non-HRP changes, which it seems like everyone agrees with and would reduce the size of the diff of this one.

#2040 (comment)

This is that PR.

Copy link
Contributor Author

@BenWestgate BenWestgate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short rationale for changes that may not be immediately obvious.

Comment on lines +21 to +22
It includes an encoding format, a BCH error-correcting checksum, and optional Shamir's secret sharing algorithms for share generation and secret recovery.
Secret data can be encoded directly, or split into up to 31 shares.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved SSS down a sentence and called it optional so users don't assume it's required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this made me double-take but we were already trying to do that with our misleading "threshold from 1 to 9" description. Your text is better.

[https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki BIP-0032] hierarchical deterministic wallet using it.
It includes an encoding format, a BCH error-correcting checksum, and optional Shamir's secret sharing algorithms for share generation and secret recovery.
Secret data can be encoded directly, or split into up to 31 shares.
A minimum threshold of shares, which can be between 2 and 9, is needed to recover the secret, whereas without sufficient shares, no information about the secret is recoverable.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closes #2023

Comment on lines +82 to +83
Note that per BIP-0173, the lowercase form is used when determining a character's value for checksum purposes.
In particular, given an all uppercase codex32 string, we still use lowercase <code>ms</code> as the human-readable part during checksum construction.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved these sentences from test vectors to definition

combined = data + ms32_create_checksum(data)
return "ms" + "1" + ''.join([CHARSET[d] for d in combined])

def ms32_decode(codex):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a function that would save a few minutes for implementers, summarizes the validity rules (except for incomplete group) in code form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did it come from? It looks correct to me but I didn't look too carefully at it.

Copy link
Contributor Author

@BenWestgate BenWestgate Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me. It's from my local copy of my python-codex32 package. It passes the test vectors. I'll send CI when I publish it.
We could reject incomplete groups here rather than wait for convertbits or your sanity_check.
Lib code will have slight diff to raise descriptive error classes like your rust does instead of return None though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view it's okay for the reference code to be more accepting than a production implementation.

But I'd also be happy tightening it up. Let's defer to a later PR though since there's enough happening here as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we never know if a base32 native secret wants to use this specification in which case incomplete groups would not apply to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks correct to me but I didn't look too carefully at it.

6763349

Fixed a bug and made the validity condition cleaner.

* The number of codex32 shares is exactly equal to the (common) threshold value.
If all the above conditions are satisfied, the <code>ms32_recover</code> function will return a codex32 secret when its argument is the list of codex32 shares with each share represented as a list of integers representing the characters converted using the bech32 character table from BIP-0173.
* The number of shares is exactly equal to the (common) threshold value.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throughout the text I reduced the use of "codex32" as an adjective to reduce jargon and avoid UI designers assuming they can't just say "share" or "secret".

===Generating Shares===

If we already have ''t'' valid codex32 strings such that:
If we already have ''k'' valid codex32 strings such that:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer k and that's what the SSS wikipedia article uses but t is also correct. The codex32 book and test vectors used k so that's why I got rid of t for consistency.

The codex32 secret and the ''k''-1 codex32 shares form a set of ''k'' valid initial codex32 strings from which additional shares can be derived as described above.

===Long codex32 Strings===
===Long codex32===
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the format "Long codex32" to match "codex32" it felt weird to make "strings" part of the proper noun. also updated test vector 5 to match.

Comment on lines 348 to 359
A secret seed is a codex32 encoding of:

* The human-readable part "ms" for master seed.
* The data-part values:
** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
** An identifier consisting of 4 bech32 characters.
*** We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed and share set the user may need to disambiguate.
** The share index "s".
** A conversion of the 16-to-64-byte BIP-0032 HD master seed to bech32:
*** Start with the bits of the master seed, most significant bit per byte first.
*** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed.
*** Translate those bits to characters using the bech32 character table from BIP-0173.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates L281-285 in "For an Existing Secret" we might want to cite this section and delete lines up there or vice versa.

Also looks like I forgot the checksum here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I vote we delete the above section and link down here. In "for an existing secret" I think we want to add a line about length. Specifically, if your seed data is 46 bytes or fewer, use codex32; otherwise use long codex32 (ref).

Copy link
Contributor Author

@BenWestgate BenWestgate Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also move this section after the Unshared Secret section.

Everyone needs the master seed standard to import/export seeds but not all users want the SSS stuff. So "Existing secret" is lower priority. If we define master seed encoding first L281-285 can be replaced with a link.

There's a lot of overlap between "Fresh" and "Existing" and it risks reader auto-pilot.

For example my reviewer on BIP85 app completely missed that initial share indices were supposed to be chosen alphabetically. (There is not supposed to be a "z" initial share for example.)

A refactor to de-duplicate is possible but I haven't thought about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the existing duplicated text. We can refactor in a followup if you want. Especially if the goal is to rearrange text in order of priority.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the new "Master seed format" to be below "Unshared Secret". The "direct encoding of ... master seed" sentence could hop down from "Unshared" leaving it HRP pure (no diff when #2040 is rebased on this.)

Looks better to me. I left "existing secret" as in aa06616.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "for an existing secret" I think we want to add a line about length. Specifically, if your seed data is 46 bytes or fewer, use codex32; otherwise use long codex32 (ref).

It says:

Generate a valid checksum in accordance with the Checksum section

And the Checksum section says:

def ms32_create_checksum(data):
    if len(data) > 80:                       # See Long codex32 Strings
        return ms32_create_long_checksum(data)

* Derived share with index <code>D</code>: <code>MS12NAMEDLL4F8JLH4E5VDVULDLFXU2JHDNLSM97XVENRXEG</code>
* Secret share with index <code>S</code>: <code>MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW</code>
* Master secret (hex): <code>d1808e096b35b209ca12132b264662a5</code>
* Recovered secret seed with index <code>S</code>: <code>MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW</code>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the book does away with "secret seed" we may drop it here and stick to "secret" or "codex32 secret". See: BlockstreamResearch/codex32#72
This is the best terminology for now, plus "secret seed" books are in print. I'm fine with using this term in BIP93.

For distinction, I'm deliberately not calling interpolated secrets "codex32-encoded master seeds" because they aren't encoded from seeds, they're decoded into seeds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me.

Comment on lines 479 to 485
This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.
The payload contains 103 bech32 characters, which corresponds to 515 bits. The last three bits are discarded when converting to a 512-bit master seed.

This is an example of a '''Long codex32 String'''.
This is an example of a '''Long codex32''' string.

k value (bech32): <code>0</code>

identifier (bech32): <code>0C8V</code>

payload (bech32): <code>M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F</code>

* Secret share with index <code>S</code>: <code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code>
* Master secret (hex): <code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example text did not fit what the data showed, assembled from random characters and appending a checksum. So I did this piecewise assembly to match the text exactly as per the discussion in #2040 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me. I don't want to hang up this PR on terminology but I find it a little bit confusing that we use the terms "master seed" and "secret seed" to refer to the same data with different encodings. I wonder if we should change "Master seed (hex)" to "Secret seed (hex)" and change "Secret seed" to "Secret seed (codex32)".

Copy link
Contributor Author

@BenWestgate BenWestgate Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Master seeds aren't bijective with their codex32-encodings due to threshold, identifier and arbitrary padding so I wouldn't use the same word.

As with up here I'm okay with just calling it "secret" or "codex32 secret", this case is not a "codex32-encoded" existing seed:

This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.

And codex32 characters should be bech32 characters?

I see you've put (codex32) in parenthesis is that the preference when giving data as opposed to (bech32) as master vector 1 used:

codex32 secret (bech32):

"codex32 secret (bech32)" or "Secret (bech32)" looks better than "Secret seed (codex32)"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with "codex32 secret (bech32)". It maaybe implies that the checksum is the bech32 checksum but I think it's clear from context what's meant. And it captures both that the character set is bech32 vs hex, and that the "codex32 secret" has more data than the bare seed.

Copy link
Contributor Author

@BenWestgate BenWestgate Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowhere do we label a result string (bech32). (hex) seems to be used everywhere because it looks more like bech32 than the reverse.

Checksum is 150% too long to be Bech32.

So I have Vector 5 like this:

k value (bech32): 0

identifier (bech32): 0C8V

payload (bech32): M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F

  • checksum: HPV80UNDVARHRAK
  • codex32 secret: MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK
  • Master seed (hex): dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9
  • master node xprv: xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN

Do we need to change Vector 2 as well or are you okay there:

  • Derived share with index D: MS12NAMEDLL4F8JLH4E5VDVULDLFXU2JHDNLSM97XVENRXEG
  • Recovered secret seed with index S: MS12NAMES6XQGUZTTXKEQNJSJZV4JV3NZ5K3KWGSPHUH6EVW
  • Master seed (hex): d1808e096b35b209ca12132b264662a5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vector 2 seems fine to me.

@apoelstra
Copy link
Contributor

apoelstra commented Dec 10, 2025

Done reviewing aa06616. Looks great, thanks! My comments are all just nits (except the "random errors" thing which I think is important).

@jonatack jonatack added Proposed BIP modification Pending acceptance This BIP modification requires sign-off by the champion of the BIP being modified labels Dec 10, 2025
…cret

reduced the heading level of checksum and error correction to make the table of contents easier to parse.

Moved Master seed Encoding to be below Unshared Secret.
If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly.

===Checksum===
====Checksum====
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToC was getting hard on my eyes with so many H3 so I indented Checksum and Error Correction so there's only 4 H3 in a row afterwards: Unshared Secret, Master seed format, Recover Secret, Generate Shares.

Comment on lines +194 to +208
When the human-readable part of a valid codex32 secret (converted to lowercase) is the string "ms", we call it a codex32-encoded master seed or secret seed. The payload in this case is a direct encoding of a BIP-0032 HD master seed.

A secret seed is a codex32 encoding of:

* The human-readable part "ms" for master seed.
* The data-part values:
** A threshold parameter, which MUST be a single digit between "2" and "9", or the digit "0".
** An identifier consisting of 4 bech32 characters.
*** We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed and share set the user may need to disambiguate.
** The share index "s".
** A conversion of the 16-to-64-byte BIP-0032 HD master seed to bech32:
*** Start with the bits of the master seed, most significant bit per byte first.
*** Re-arrange those bits into groups of 5, and pad with arbitrary bits at the end if needed.
*** Translate those bits to characters using the bech32 character table from BIP-0173.
** A valid checksum in accordance with the Checksum section.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checksum was added. Rest is unchanged from it's old location.

===Test vector 5===

This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.
This example shows generating a new 512-bit master seed using "random" bech32 characters and appending a checksum.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

caught this. There's no such thing as "codex32" characters because we use the bech32 character set.

combined = data + ms32_create_checksum(data)
return "ms" + "1" + ''.join([CHARSET[d] for d in combined])

def ms32_decode(codex):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks correct to me but I didn't look too carefully at it.

6763349

Fixed a bug and made the validity condition cleaner.

Comment on lines 179 to 184
if pos < 2 or not (48 <= len(codex) <= 127):
return None
if not all(x in CHARSET for x in codex[pos+1:]):
return None
hrp = codex[:pos]
if not (hrp == "ms" and codex[pos+1].isdigit()) or codex[pos+1] == "0" and codex[pos+6] != "s":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier to read without parentheses.

@apoelstra
Copy link
Contributor

ACK 6763349

@jonatack jonatack removed the Pending acceptance This BIP modification requires sign-off by the champion of the BIP being modified label Dec 15, 2025
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good improvements, with some follow-ups (IIUC, not sure) for a new PR.


def ms32_encode(data):
combined = data + ms32_create_checksum(data)
return "ms" + "1" + ''.join([CHARSET[d] for d in combined])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible follow-up: "ms" is ~defined further below, line 193 and 197, after its use throughout the code -- maybe define it before the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. In the spec definition we say it's always "ms" but not why until the master seed format standard.

If #2040 is cleaned up to avoid making contentious statements about length, BIP93 will be human-readable part agonistic like BIP-0173. And this would renamed def codex32_encode(hrp, data) similar for verify_checksum and create_checksum.

Half the purpose of this PR was to rebase #2040 on it for a simpler diff to review.

@jonatack jonatack merged commit 98935ff into bitcoin:master Dec 15, 2025
4 checks passed
@murchandamus
Copy link
Contributor

Thanks @BenWestgate, @apoelstra, and @jonatack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants