Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions tests/test_roundtrip_interpolated.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
from src.codex32.codex32 import Codex32String


# secret share from seed
s = Codex32String.from_seed(bytes.fromhex("68f14219957131d21b615271058437e8"), "ms13k00ls")
assert s.s == "ms13k00lsdrc5yxv4wycayxmp2fcstppharks8z0r84pf3uj"

# derive 'a' via proposed BIP-85
a = Codex32String.from_seed(bytes.fromhex("641be1cb12c97ede1c6bad8edf067760"), "ms13k00la")
assert a.s == "ms13k00lavsd7rjcje9ldu8rt4k8d7pnhvppyrt5gpff9wwl"

# derive 'c' via proposed BIP-85
c = Codex32String.from_seed(bytes.fromhex("61b3c4052f7a31dc2b425c843a13c9b4"), "ms13k00lc")
assert c.s == "ms13k00lcvxeugpf00gcac26ztjzr5y7fkjl7fx7nx7ykhkr"

# derive next share via interpolation
d = Codex32String.interpolate_at([s, a, c], "d")
assert d.s == "ms13k00ldp4v5nw8lph96x47mjxzgwjexe44p32swkq99e0w"

# now round-trip d share ('d' is derived via interpolation, NOT via 'from_seed')
dd = Codex32String.from_seed(d.data, "ms13k00ld")
Copy link
Owner

@BenWestgate BenWestgate Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do this. You can only .from_seed without passing pad_val for the k initial strings, derived strings MUST be passed padding to round-trip..

You needed to be able to do this:

dd = Codex32String.from_seed(d.data, "ms13k00ld", d.pad_val)

This version's Codex32String lacks a pad_val property, I'm working on an update which does.

No matter what padding style we use, since it's less than a full 5-bit value, so not in field GF(32), it will not interpolate into derived shares and maintain any linear relationship that allows round-tripping from bytes, GF(256), to GF(32) interpolated strings without passing the padding.

The only string you should care about data of after construction is "s" so the fact other share index values can return data is more of a curiosity and maybe .data should Raise InvalidShareIndex or return None if share_idx != "s" to this misuse.

What is your exact use case where you really need to store ALL the shares as bytes and recover back to codex32?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm able to do this which fixes this test case:
dd = Codex32String.from_seed(d.data, "ms13k00ld", pad_val=1)

but I have no idea how did I get to the pad_val=1 besides grinding it against the string which I already know (which won't be the case in real life)

I don't know how to enforce that at the library level, any ideas?

not really... besides grinding correct pad_val right after construction of derived share via round-trips (very meh)

What is your exact use case where you really need to store ALL the shares as bytes and recover back to codex32?

So my general idea is that I can use individual shares as normal secrets, load them on HWW, sign with them, etc. For instance user uses one HWW device to do the shamir split, while having N devices ready to export generated/derived shares as QR codes for instance. Load these derived shares on devices and geo-distribute the devices. These then serve as decoy, fully functional signers. When S secret is needed user just collect K devices & does some QR scanning to recover the S on empty HWW.

For this I thought I can use this from_seed/to_seed round-trips. Secure element storage is limited so for me byte encoding is more desired instead of u5.

But now, it seems this was never intended purpose of the non-secret shares, which seems more as just recovery tools, aka data with one and only one purpose - to recover share S (which is kind of pity tbh). Am I reading this correctly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that if round-trips with derived shares can be achieved somehow, even if passing padding is necessary, it should be desired.

Copy link
Owner

@BenWestgate BenWestgate Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I have no idea how did I get to the pad_val=1 besides grinding it against the string which I already know (which won't be the case in real life)

You had to grind it because you discarded the pad_val. You might recover a different last data character if you don't know the last character without padding. interpolate_at operates on 5-bit values not bytes.

any ideas?

not really... besides grinding correct pad_val ... (very meh)

It may be possible to do it if you give up being able construct "non-encoded" shares from bytes data and instead accept construction of a Codex32ShareSet object with a from_bytes (or from_seeds) factory. And then use an interpolate_at(share_idx) method of that share set object.

What is your exact use case...?

generated/derived shares as QR codes for instance.

Make sure to skim this compact CodexQR discussion before speccing a QR design, it's the analog of compact SeedQR. I found a fun way to fit 128-bit codex32 share data into 21x21 QR codes by dropping some of the identifier.

Whatever solution we find for Codex32ShareSet.from_bytes(header, dict) would be very helpful there, as well as here.

These then serve as decoy, fully functional signers.

This seems useful!

For this I thought I can use this from_seed/to_seed round-trips.

You may be able to round trip the share set from_seeds/to_seeds or .data of individual shares but we need to define the correct Codex32ShareSet from_seeds class method to make this possible.

The source of truth in a Codex32ShareSet should be the common header and the byte payloads of "s", "a", "c" for k = 3 or maybe "a", "c", "d". CRC padding, which does not interpolate, is slightly more useful on a share you can actually find and verify it on, than trying to interpolate to an unknown share to check if it validates.

Secure element storage is limited so for me byte encoding is more desired instead of u5.

A 21x21 QR has only 137.2 bits if using base45 alphanumeric encoding, 138.2 bits if also using kanji, bytes and numeric modes. So it'd be excellent for us to define a compact encoding of share data. The bare minimum needed to always recover the correct secret and with what's left: prevent user errors.

But now, it seems this was never intended purpose of the non-secret shares, which seems more as just recovery tools, aka data with one and only one purpose - to recover share S (which is kind of pity tbh). Am I reading this correctly?

Yes, this is not their intended purpose but they do contain randomness and I think your idea is a cool and efficient use of that otherwise wasted random data needed for SSS so worth pursuing IF it can be done securely (not revealing any more info about "s" than, at most, its padding bits with k-1 shares.)

I also think that if round-trips with derived shares can be achieved somehow, even if passing padding is necessary, it should be desired.

I agree. The solution to recover seeds from bytes alone is non-trivial but it should exist, lets find it. You'll find this bytes vs 130-bits question tripped up Andrew in the QR discussion, it's always surprising how padding behaves as the finite field changes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scgbckbone do you still want me to come up with a way to recover the same seed from any share's (not just initial strings) bytes without passing padding? I think it's technically possible but I haven't thought about how to do it yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great! I wanted to try myself but do not have time for it atm. Imo it is very useful

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm writing a test for it. The first thing I notice is to go from bytes to a share, you need the share index.

So what you're really asking for is a way for a list of k shared-indexed bytes payloads to always recover the correct seed.

Unless you're implying the bytes payload should also derive the share index so it doesn't need to be stored & passed with the bytes. That requires some grinding when generating initial share payloads so that all n shares have the relation that computes the share index. It's feasible for up to 10 shares or so and gets rather slow towards 15-20.

Question 1:

  1. Store the share index with the bytes
  2. Derive the share index from the bytes

Question 2:

  • May we assume the bip32 fingerprint of the secret bytes is always available?
    • That allows a brute force search to try every share index and padding combination until the recovered seed matches the fingerprint, without forcing special derivation rules for padding and share index.

Question 3:
Should this recovery from bytes method work for bytes extracted from:

  1. Any share set, including ones with random padding as per BIP93 and the codex book
  2. Only share sets specially constructed to facilitate recovery from reduced information

Question 4:
Would it be easier for you to just prepend the share index bits to the payload bytes and append the padding bits (zero padded as necessary to bytes)?

  • There does not seem to be anything wrong with BIP32 taking this extra non-random data as the seed but your software could also discard it to follow BIP93 exactly.

Question 5:
What about the threshold? Where is this stored?

  1. With the bytes
  2. Implied by the length of the bytes list passed to the recovery method

Question 6:
Is there any identifier? How do you avoid combining bytes from different share sets together in cases you have more than one of these bytes backups?

With the answers to these questions I should be able to proceed with a solution.

I am happy to help because what you want is very similar to my "Compact CodexQR" system where I have to represent shares in 138-140-bits, leaving only 10-12 bits for thresholds, id, share_idx and padding. Clearly some user experience and functionality will be lost but the trade off is worth it for a smaller QR code.

The solution may be a "lossy" compact_qr_decode(codex): which returns 18 bytes representing the essential data of the share, enough to recover the secret seed and some trade-off of:

  • basic mismatch detection by storing some identifier bits
  • useful UX like the recovery threshold, and/or
  • some or all 9 thresholds

I'd also try to give higher thresholds more identifier bits as they have k-1 opportunities for mismatching.

For UI it's nice to store id bits in 5 bit chunks so whole bech32 characters can be displayed when scanning these Compact CodexQRs and they're interoperable with the regular shares. But even non-multiples of 5 still give some mismatch protection.

# they are NOT equal after round-trip - seem we miss padding at interpolation level
assert dd.s != d.s # FAIL (should equal)

# irrelevant
# e = Codex32String.interpolate_at([s, a, c], "e")
# assert e.s == "ms13k00lezuknydaaygk5u20zs4fm736vj909mdj6xqp8pc2"
#
# f = Codex32String.interpolate_at([s, a, c], "f")
# assert f.s == "ms13k00lf0ehe53zsu6vrxcjjh9v7wzsa83mqfvku3fd8kem"

# recover from shares, use 'd' without round-trip
rec_s = Codex32String.interpolate_at([a, d, c], "s")
# recover from shares, use 'd' after round-trip
rec_ss = Codex32String.interpolate_at([a, dd, c], "s")

print(" s:", s.data.hex())
print(" rec_s:", rec_s.data.hex())
print("rec_ss:", rec_ss.data.hex())
assert s.data == rec_s.data
assert s.data == rec_ss.data # FAIL
Loading