Skip to content

Conversation

@scgbckbone
Copy link

@scgbckbone scgbckbone commented Dec 5, 2025

I just noticed that if I import string share(s) to my app and serialize it for storage where I'm storing just seed (aka data) and metadata (hrp, idx, threshold, id) separately. Next when I try to load storage data to Share object (via from_seed) and try to recover secret share s, shares before serialization provide different result compared to shares loaded from seed.

My guess is that this has something to do with padding. Any idea how to fix this issue ?

I added test case proving the point:

---- tests::my_vector stdout ----
thread 'tests::my_vector' panicked at src/lib.rs:501:9:
assertion `left == right` failed
  left: "10cbc41852b76438e5781f2cefb49799"
 right: "10cbc41852b76438e5781f2cefb4979f"

@apoelstra
Copy link
Owner

apoelstra commented Dec 5, 2025

Yeah, it's to do with padding, but the reason seems to be that this library is badly broken/confused.

In codex32 the threshold/id/index 3k00la are all part of the codex32-encoded data along with the "actual" data. The first six characters are 30 bits, meaning that the share data, when encoded, should be right-shifted by 6 bits. Instead, we treat the data as its own bytestring which we convert to/from codex32 without shifting, effectively injecting 2 "padding" bits that aren't recognized as padding, aren't zeroed out, and eventually wind up at the end of the string.

At least, that's my best interpretation of what's going on. The data structures in this library are a real mess and combine strings, Fe vectors and byte vectors in arbitrary/lazy ways.

I apologize for the state of this library -- for a long time I have intended to replace all the error-correction logic with rust-bech32 0.12, which will have codex32 support. (It will not have interpolation logic, but that's easy/small and I will continue to implement it here.)

However, I let rust-bech32 0.12 get scope-creeped into doing error correction, which is still not done. There is a tracking issue here rust-bitcoin/rust-bech32#189. Maybe I should just cut a release so that I can fix this library.

@apoelstra
Copy link
Owner

Oh, I'm not actually blocked on rust-bech32 0.12. Indeed, the docs for 0.11 have codex32 support as an example.

@BenWestgate
Copy link

The .from_seed() factory needs a padding parameter and .to_seed() or codex32_decode(bech) needs to return the encoding's padding.

Without these it won't round trip and worse recovers a different seed.

Using default 0 padding on shares has some other problems:

  • Can't disambiguate Bech32 vs Codex32 checksum formats during error correction.
  • If attacker with < k derived shares knows the initial k used 0 padding, the last character of the seed is leaked early. Attacker learns 5 bits per share but the random data is only (5-pad_len) * k. So for 128-bit seeds, two k=3 shares known to use zero padding reveal the last 3 bits of the secret.
  • Missed chance to detect bit errors, e.g.: 4-bit burst errors on 256-bit seed data. (95% chance of detecting a symbol error)

For the BIP85 codex32 application (and that compact QR idea) I set the padding bits by CRC of the data to avoid this.

That way they depend on 128-bits of unknown data and can't be assumed and are deterministic.

A fast fix would be .from_seed only constructs index "s" strings but that loses useful functionality vs giving it a secure deterministic default.

@apoelstra
Copy link
Owner

If it's only possible to construct S strings then users might as well just use rust-bech32 :). The point of this library is that it can also do interpolation.

@BenWestgate

This comment has been minimized.

@scgbckbone
Copy link
Author

For the BIP85 codex32 application (and that compact QR idea) I set the padding bits by CRC of the data to avoid this. That way they depend on 128-bits of unknown data and can't be assumed and are deterministic.

smart, and does the trick! (I tried with your python-codex32)

giving it a secure deterministic default.

I agree this is much better than passing padding around

@BenWestgate
Copy link

BenWestgate commented Dec 7, 2025

smart, and does the trick! (I tried with your python-codex32)

Thanks, I just rewrote it to save 100 lines and put polymod in an Encoding class for any checksum: Bech, Bech32m, long Codex32, Codex32 and CRC. Have to re-add this CRC padding feature and will publish.

giving it a secure deterministic default.

To standardize CRC padding we should:

  • Decide to cover expanded hrp and all data or only payload bits
  • Exhaustively test candidate polynomials and CONST (xor_out) values to maximize error detection of codex32 strings with 9 errors.
    • Probably 1-bit errors as CHARSET is chosen so these are most likely and all CRCs are pushed well past their max length for hamming distance 3 (guaranteed detection of 2 errors and correction of 1).
    • Tie break CRC-2 on 128-bit "ms" secrets, CRC-4 on 256-bit "ms" secrets and CRC-3 on 512-bit "ms" secrets.

I agree this is much better than passing padding around

My library passes padding around, it tests the alternate encodings by encoding every pad_val while this library only checks they decode to the same bytes.

def test_from_seed_and_alternates():
    """Test Vector 4: encode secret share from seed"""
    seed = bytes.fromhex(VECTOR_4["secret_hex"])
    for pad_val in range(0b1111 + 1):
        s = Codex32String.from_seed(seed, header="ms10leet", pad_val=pad_val)
        assert str(s) == VECTOR_4["secret_s_alternates"][pad_val]
        assert s.data == seed
        # confirm all 16 encodings decode to same master data

Given we leak a secret character if initial from_seed shares are always zero padded, to do so is a library bug.

@apoelstra
Copy link
Owner

@BenWestgate if you have a rewrite feel free to open a PR -- if it's not too hard to review I'm happy to take it in. (Though "I saved 100 lines" makes me worry that it's a big diff.)

But I think the correct direction to rewrite in is one where we add a rust-bech32 dependency, use that for all the checksumming and encoding stuff, and then here we add (a) utility methods, and (b) constructors and accessors for the id/threshold/share index.

@BenWestgate
Copy link

@BenWestgate if you have a rewrite feel free to open a PR -- if it's not too hard to review I'm happy to take it in. (Though "I saved 100 lines" makes me worry that it's a big diff.)

It's in python, about 400 lines excluding tests. Most of it from BIP-93, BIP-0173 or ported from this implementation. It also passes Bech32 tests.

But I think the correct direction to rewrite in is one where we add a rust-bech32 dependency, use that for all the checksumming and encoding stuff, and then here we add (a) utility methods, and (b) constructors and accessors for the id/threshold/share index.

This is exactly what I did in python: wrote a general Encoding class and then a Codex32String class with utility methods, constructors and properties.

class Encoding(Enum):
    """Enumeration type to list the various supported encodings."""

    CODEX32 = (CODEX32_GEN, 13, 0x10CE0795C2FD1E62A)
    CODEX32_LONG = (CODEX32_LONG_GEN, 15, 0x43381E570BF4798AB26)
    BECH32 = (BECH32_GEN, 6, 1)
    BECH32M = (BECH32_GEN, 6, 0x2BC830A3)

    def __init__(self, gen, cs_len, const):
        self.gen = gen
        self.cs_len = cs_len
        self.const = const

    def polymod(self, values: list[int], residue=1):
        """Internal function that computes the Bech32/Codex32 checksums."""
        shift = 5 * (self.cs_len - 1)
        mask = (1 << shift) - 1
        for value in values:
            top = residue >> self.shift
            residue = (residue & mask) << 5 ^ value
            for i, g in enumerate(self.gen):
                residue ^= g if ((top >> i) & 1) else 0
        return residue


def _verify_checksum(data):
    """Verify a checksum given HRP and converted data characters."""
    for spec in Encoding:
        if spec.polymod(data) == spec.const:
            return spec
    return None


def _create_checksum(values, spec: Encoding):
    """Compute the checksum values given HRP and data."""
    polymod = spec.polymod(values + [0] * spec.cs_len) ^ spec.const
    return [(polymod >> 5 * (spec.cs_len - 1 - i)) & 31 for i in range(spec.cs_len)]


def u5_to_bech32(data: list[int]):
    """Map list of 5-bit integers (0-31) -> bech32 data-part string."""
    if not all(x in range(32) for x in data):
        raise InvalidDataValue
    return "".join(CHARSET[d] for d in data)


def bech32_to_u5(bech: str):
    """Map bech32 data-part string -> list of 5-bit integers (0-31)."""
    if not all(x in CHARSET for x in bech[pos + 1 :]):
        raise InvalidChar
    return [CHARSET.find(x) for x in bech.lower()]


def bech32_hrp_expand(hrp):
    """Expand the HRP into values for checksum computation."""
    return [ord(x) >> 5 for x in hrp] + [0] + [ord(x) & 31 for x in hrp]


def bech32_encode(hrp, data, spec):
    """Compute a Bech32 string given HRP and data values."""
    combined = data + _create_checksum(bech32_hrp_expand(hrp) + data, spec)
    return hrp + "1" + u5_to_bech32(combined)


def bech32_decode(bech: str):
    """Validate a Bech32 string, and determine HRP and data."""
    if (any(ord(x) < 33 or ord(x) > 126 for x in bech)):
        raise InvalidChar
    if bech.lower() != bech and bech.upper() != bech:
        raise InvalidCase
    bech = bech.lower()
    pos = bech.rfind("1")
    if pos < 1 or pos > 83 or pos + 7 > len(bech):  # or len(bech) > 90:
        raise InvalidLength
    hrp = bech[:pos]
    data = bech32_to_u5(bech[pos + 1 :])
    spec = _verify_checksum(bech32_hrp_expand(hrp) + data)
    if spec is None:
        raise InvalidChecksum
    return (hrp, data[: -spec.cs_len], spec)


def codex32_decode(codex: str):
    """Validate a Codex32/Long Codex32 string, and determine HRP and data."""
    hrp, data, spec = bech32_decode(codex)
    if spec not in Encoding.CODEX32, Encoding.CODEX32_LONG:
        raise NotCodex32Checksum
    if 19 > len(data) or len(data) > 1023:
        raise InvalidLength
    if not codex[len(hrp) + 1].isdigit():
        raise InvalidThreshold
    if codex[len(hrp) + 1] == "0" and codex[len(hrp) + 6] != "s":
        raise InvalidShareIndex
    return hrp, data, spec

Then a Codex32String class has the same API as this Rust impl.

Should I PR this python implementation?

By replacing the 5's with a spec.word_size property in _create_checksum and polymod it can also compute CRCs on binary data for deterministic default padding in from_seed. That's my last step before I publish it to PyPI again.
https://pypi.org/project/codex32/

@BenWestgate
Copy link

If only S strings then users might as well just use rust-bech32 :). The point of this library is that it can also do interpolation.

What about extracting bytes data from non-"s" strings? Raise InvalidShareIndex error or output useless bytes without their padding?

The PR author is able to achieve an unexpected (but inevitable) result that he can't recover the original secret when derived shares are reconstructed from bytes extracted from the original derived shares.

BenWestgate/python-codex32#2

Should our libraries:

  • raise InvalidShareIndex if share_idx != 's' when accessing the data property.
  • return useless unpadded bytes
  • return bytes and padding
  • something else?

@scgbckbone
Copy link
Author

maybe I misunderstood the scope of this application - @apoelstra can you check this last comment of mine BenWestgate/python-codex32#2 (comment)

@scgbckbone
Copy link
Author

something else?

@BenWestgate I'm definitely in this category. If round-trips (even for derived shares) can be achieved, it should be desired property for Codex32

@BenWestgate
Copy link

BenWestgate commented Dec 8, 2025

something else?

@BenWestgate I'm definitely in this category. If round-trips (even for derived shares) can be achieved, it should be desired property for Codex32

The problem you encounter is you can't construct 130-bits of data from 16 bytes for all 32 share indices because any math you do to pad, even constants like all 0s or all 1s, only works for k initial (aka encoded) strings. The derived (aka interpolated) strings break that padding because it was not a 5-bit value and so is not preserved by GF(32) interpolation the way the u5 checksum or header is.

This means you need to know the last payload character for all the initial strings in order to add the correct padding to bytes extracted from derived strings.

So a solution may need to define a Codex32ShareSet class. Then a valid share set may be constructed from_seeds with:

  • a common unique header string
  • a dictionary, len(share_indexed_data_dict) = k, where keys are unique share indices and the values are each string's byte data

This allows a Codex32ShareSet.interpolate_at method to construct new Codex32String objects based on a default initial string padding regime.

That's the theory anyway, the math is trickier as the design must not leave less than len(share_indexed_data[share_idx]) * 8 bits unknown when k - 1 shares are known.

If you have any idea for def Codex32ShareSet and its @classmethod factory def from_seeds(header, share_indexed_data_dict): return cls(...) please share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants