-
Notifications
You must be signed in to change notification settings - Fork 5.8k
BIP93: Generalize codex32 format for any hrp and fix typos #2040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Clarify codex32 format for different hrp values, specify master seed encoding standard, add new test vectors and enhance readability.
bip-0093.mediawiki
Outdated
| errors. The human-readable part is processed by first | ||
| feeding the higher bits of each character's US-ASCII value into the | ||
| checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?''' | ||
| This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lengths limitations of the codex32 strings are working under the assumption that the HRP is not subject to error correction. We more or less cannot do that anyways as all sorts of various bech32 formats have appeared all with different checksums and characteristics. In order to run the checksum algorithm you have to know the prefix first in order to know which checksum algorithm to try.
This isn't really a problem in practice since there are only a small finite number of prefixes, and from context only a few are going to be applicable anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was copied over from BIP-00173. Delete it?
Bech32 attempts to decode two checksums, a universal bech32 decoder could try decoding the string with the bech32, bech32m and codex32 checksums to discover the format.
Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.
If HRP is swapped between formats the chances of false verification is:
-
1 in 2^65 for a "codex32 checksum" validating when the encoding was Bech32/Bech32m
-
~1 in 2^30 for "Bech32 checksum" validating when the encoding was Codex32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text was certainly the design goal of BIP-173, but we are not using their checksum, and we haven't realized this part of their design in codex32 in part because our 13 character checksum unfortunately works only on relatively short strings.
Instead we process this HRP in this way because that is what BIP-173 does, and we still want the HRP to change the residue to catch random errors, so me might as well do it in the standard way.
Unless covering the HRP exceeds the max length at HD=9, 2 subsitutions in the HRP will always be detected by every format.
The problem is that our particular 13 character checksum's max length for its error detection and correction properties is limited to 93 bech32 characters. That's why our payload is limited to 74 characters add in 13 character checksum and 6 characters for the header and we get 93 bech32 characters, with nothing left over to detect or correct errors in the HRP. Yes, in cases where the payload is 72 characters or less, our error correction / detection properties extend to the low 5 bits of the ascii characters of a 2 character prefix, but that doesn't apply to 73 or 74 character payloads.
I don't know if we really want to get into these subtleties. I'm not even sure correcting and detecting errors in the HRP is useful to begin. If you are a hardware wallet expecting a master seed and someone gives you a "cl" codex32 string, you don't need a fancy error correction algorithm to detect the "cl" prefix is wrong; if it is a expecting a master seed then the "cl" prefix must be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will delete the footmark and say:
"The human-readable part is processed as per BIP-0173."
I updated the rationale accordingly:
At this length, the human-readable part is not covered by the checksum. This is acceptable because the checksum scheme itself requires you to know that a valid human-readable part is being used in the first place. If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected prefix.
I'm not even sure correcting and detecting errors in the HRP is useful to begin
wallets.md import guidance prefills the prefix in applications expecting only one to prevent mistakes.
A future application for extended keys (Is long codex32 HD=9 for 74 bytes?) has a situation where decoders need to accept both "xprv" and "xpub" HRPs in the same descriptor. So here it is absolutely useful to detect and correct errors in the HRP.
However we should not go into details about that until such an application needing to disambiguate different HRP actually exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does HD=9 mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we REQUIRE every registered HRP be unique in the lower 5 bits then:
- we don't have to ever distinguish errors in the high bits they're a lookup table, VALID_HRP.
- the expanded data we covered by the checksum will be < 93 single counting hrp characters.
- With the valid HRP table, Correcting errors in the low bits, corrects any errors in the high bits.
- We know which checksum is being used by the length of the string, which is far simpler than a per hrp (impossible) design or one that double weights hrp characters towards max_length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the valid HRP table, Correcting errors in the low bits, corrects any errors in the high bits.
It's not that simple. For maximum length codex32 string, errors in the high bits appear as errors in the checksum at the end of the string because for BCH codes, any polynomial longer than the maximum length (93 in our case) effectively wraps around.
Edit: And errors in the high bits are not going to be uncommon. Let me tell you the number of times I've mistaken a 5 for an S.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's guarantee to detect 8 errors, 13/15 if contiguous in the low bits, BIP-0173 style but not error correct the HRP. Trying different suspected HRPs will have to be the way to correct a damaged prefix, like our rationale suggests doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mistaking a 5 for an S in the HRP counts as two errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIP-0173 error detection doesn't tell you where the errors are. In particular it doesn't tell whether the HRP is correct or not. You need to invoke the error correction to find locations.
bip-0093.mediawiki
Outdated
| As with bech32 strings, a codex32 string MUST be entirely uppercase or entirely lowercase. | ||
| For presentation, lowercase is usually preferable, but uppercase SHOULD be used for handwritten codex32 strings. | ||
| If a codex32 string is encoded in a QR code, it SHOULD use the uppercase form, as this is encoded more compactly. | ||
| The lowercase form is used when determining a character's value for checksum purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense. The lowercase form and uppercase form of Bech32 characters have the same value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for HRP which needs to be lower cased during decoding or bech32_hrp_expand(hrp) would return a different result.
This line is repeated from the test vectors, why explain the rules about case in the vectors instead of up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should reword this to make it more clear that the relevance is for the HRP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"When constructing or verifying a checksum, the human-readable part MUST be interpreted in lowercase, as specified in BIP-0173."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might say "MUST be converted to lowercase" instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to imply mutating the string when verifying a checksum.
Something BIP-93 omitted was that encoders should always emit lower-case strings. Did we relax that requirement?
Currently I have the sentence as:
"Encoders MUST emit lowercase; decoders MUST reject mixed-case and MUST lowercase the human-readable part during checksum verification."
And I am adding a section with a codex32_encode and codex32_decode definitions as I think it's easier to see these rules in code than english.
Uppercase/lowercase
The lowercase form is used when determining a character's value for checksum purposes.
Encoders MUST always output an all lowercase Bech32 string. If an uppercase version of the encoding result is desired, (e.g.- for presentation purposes, or QR code use), then an uppercasing procedure can be performed external to the encoding process.
Decoders MUST NOT accept strings where some characters are uppercase and some are lowercase (such strings are referred to as mixed case strings).
For presentation, lowercase is usually preferable, but inside QR codes uppercase SHOULD be used, as those permit the use of alphanumeric mode, which is 45% more compact than the normal byte mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm concerned both all lowercase and all uppercase strings are valid, so encoders can produce either format with lowercase is generally preferred. I'm not really sure what BIP-173 thinks it is achieving by talking about encoders being somewhat different from a post-processing step. Maybe they are just trying to say that when creating a checksum, of course, a lowercase HRP must be used.
That seems to imply mutating the string when verifying a checksum.
This is exactly what the BIP-173 reference python decoder does:
https://github.com/sipa/bech32/blob/master/ref/python/segwit_addr.py#L78
However, "The lowercase form is used …" is also fine wording though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Decoders MUST use the lowercase form of the human-readable part during checksum verification."
bip-0093.mediawiki
Outdated
|
|
||
| * Secret share with index <code>S</code>: <code>MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK</code> | ||
| * Master secret (hex): <code>dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9</code> | ||
| unchecksummed string (bech32): <code>MS10C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F</code> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be included to remove this uncheckedsummed string. I'm really nervous displaying strings without a checksum anywhere. They are very problematic.
If you insist on going into this much detail in this test vector I'd say use the following bullets
- Master seed (hex):
- master node xprv
- Payload
- HRP
- Identifier
- Checksum
- Secret seed
That's the order I'd use, but maybe some other permutations are also good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about since the text said:
This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.
human-readable part: MS
k value: 0
identifier: 0C8V
share index: S
payload: M32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06F
- checksum:
HPV80UNDVARHRAK - secret seed:
MS100C8VSM32ZXFGUHPCHTLUPZRY9X8GF2TVDW0S3JN54KHCE6MUA7LQPZYGSFJD6AN074RXVCEMLH8WU3TK925ACDEFGHJKLMNPQRSTUVWXY06FHPV80UNDVARHRAK - Master seed (hex):
dc5423251cb87175ff8110c8531d0952d8d73e1194e95b5f19d6f9df7c01111104c9baecdfea8cccc677fb9ddc8aec5553b86e528bcadfdcc201c17c638c47e9 - master node xprv:
xprv9s21ZrQH143K4UYT4rP3TZVKKbmRVmfRqTx9mG2xCy2JYipZbkLV8rwvBXsUbEv9KQiUD7oED1Wyi9evZzUn2rqK9skRgPkNaAzyw3YrpJN
No information is displayed we did not already in Vector 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you rather just change V5 text to match master's vectors?
From:
This example shows generating a new 512-bit master seed using "random" codex32 characters and appending a checksum.
To:
This example shows the long codex32 format, when used without splitting the secret into any shares.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somewhat prefer the current text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted it
We start given
k value = 0
identifier = 0C8V
payload =
then compute
- checksum
- secret seed
- Master seed
- master node xprv
We are able to infer share index = "s" and hrp = "MS" from the text.
FWIW I have been using the term "secret seed" when a codex32 secret is arrived at from bech characters (interpolation or randomly selected) and I have been using the term codex32-encoded master seed when it's produced from bytes.
This is slightly more precise but we need not bother readers with the distinction since both are valid.
… case in checksum
bip-0093.mediawiki
Outdated
| # Choose a threshold value ''k'' between 2 and 9, inclusive | ||
| # Choose a 4 bech32 character identifier | ||
| #* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate. | ||
| #* We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every secret the user may need to disambiguate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say "secret or set of shares" because this is the reshare case you mentioned that SHOULD have a unique identifier?
Here we make it sound like it's OK to reuse an identifier if the secret is the same which is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, set of shares.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used "set of shares" in for an existing secret and "secret" in for a fresh secret.
This is technically correct, no need to say both "secret and set of shares" in existing secret, if you follow that process you always get a fresh set of shares and that is what needs to be uniquely identified not the secret per se.
Clarify codex32 specification and examples for encoding and decoding processes, including detailed explanations of parameters and checksum handling.
|
|
||
| The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 400-bit secret. | ||
| The longest data part using the regular 13 character checksum is 93 characters and corresponds to a 368-bit secret. | ||
| At this length, the prefix <code>MS1</code> is not covered by the checksum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is implicit from the spec definition:
Strings of length 95 and 96 MUST use HRP "ms" (or "MS")
If it needs to be explained here also. Or a sentence why the maximum length is 94 for other HRP, let me know and I'll try.
We could reduce maximum length to 94 characters and remove the special HRP vs length rule, but that breaks existing "46-byte codex32-encoded master seeds" and these are absolutely critical to support given the wide-spread deployment of both codex32 and 46-byte master seeds.
| At this length, the prefix <code>MS1</code> is not covered by the checksum. | ||
| This is acceptable because the checksum scheme itself requires you to know that the <code>MS1</code> prefix is being used in the first place. | ||
| If the prefix is damaged and a user is guessing that the data might be using this scheme, then the user can enter the available data explicitly using the suspected <code>MS1</code> prefix. | ||
| This is acceptable because the checksum scheme itself requires you to know that a codex32 human-readable part is being used in the first place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we should link to the registry somewhere in our document so people know what a "codex32 human-readable part" might be:
Where's the best place to put this hyperlink?
https://github.com/satoshilabs/slips/blob/master/slip-0173.md#uses-of-codex32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want to be seen as endorsing a particular registry. But I also see how a link could be useful, so I'm torn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe best to leave the registry out since they may or may not be Bitcoin related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BIP-0173 which is a prerequisite for implementing this format links to that registry.
We don't have to link to it as they can find it in BIP-0173 but it the concept of registering codex32 HRP should be mentioned to avoid chaos and disaster of using anything for everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Well if there is precedent then I guess it is okay.
|
Fair enough, double counting it is. That's easy to remember and implement.
Packing the 7-bit values together wouldn't have helped much anyway because single ASCII errors would affect multiple data symbols especially when isolated.
I'll be adding your length of the string is 93 - length of the HRP.
And we're keeping the special rule for 96 where the HRP MUST equal "MS"?
Sent with [Proton Mail](https://proton.me/mail/home) secure email.
…On Wednesday, November 26th, 2025 at 6:41 PM, roconnor ***@***.***> wrote:
@roconnor commented on this pull request.
---------------------------------------------------------------
In [bip-0093.mediawiki](#2040 (comment)):
> polymod = ms32_polymod(values + [0] * 13) ^ MS32_CONST
return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]
</source>
+This implements a [https://en.wikipedia.org/wiki/BCH_code BCH code] that
+guarantees detection of '''any error affecting at most 8 characters'''
+and has less than a 3 in 10<sup>19</sup> chance of failing to detect more
+errors. The human-readable part is processed by first
+feeding the higher bits of each character's US-ASCII value into the
+checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?'''
+This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the
> What do we need to restrict it to in order to allow every US-ASCII character? Keeping in mind it expands to two 5 bit values but only the upper 2 bits can change so we should have better detection ability than 10 bits per character.
The issue is that for BCH codes the data really needs to fit within their length restriction, so we cannot just count entropy. Even if we know some some bits are fixed, if the polynomial we extract has degree more that 93, everything falls apart because it can no longer distinguish errors on one side of the polynomial from errors on the other side of the polynomial. The rules that the the string length plus counting the hrp again must be less than 93 is the only thing that makes the HRP expanded polynomial concatenated with the data part fit in degree at most 93.
—
Reply to this email directly, [view it on GitHub](#2040 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ARQZ6F5CUBY3QXAITRFPOLL36ZCEHAVCNFSM6AAAAACM4BCWD6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMJTGEYDOMBQHA).
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
|
Yes let's keep the silly exception for now for the sake of getting a agreeable PR. We should hammer out the master seed bit-size restrictions in a separate PR. If you want to say that length 96 ms seeds are deprecated that's okay too. But I still want to argue for the merits of 160 bit master seeds. |
| def codex32_verify_checksum(hrp, data): | ||
| if len(data) >= 96: # See Long codex32 Strings | ||
| return ms32_verify_long_checksum(data) | ||
| return codex32_verify_long_checksum(bech32_hrp_expand(hrp) + data) | ||
| if len(data) <= 93: | ||
| return ms32_polymod(data) == MS32_CONST | ||
| return codex32_polymod(bech32_hrp_expand(hrp) + data) == CODEX32_CONST | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike a situation where valid "long codex32" strings can be shorter overall (and in data part characters) than regular codex32.
"long" codex32 format: 10 hrp characters + 1 + 6 header characters + 54 payload characters + 15 checksum characters = 86
codex32 format: "ms" hrp characters + 1 + 6 header characters + 74 payload characters + 13 checksum characters = 96
So I'm now restricting the HRP length and leaving codex32_verify_checksum() alone. The checksum will remain selected as it currently is:
based on the length of the data.
The maximum HRP length will be restricted (going forward) so that any HRP is always covered by our 4 error correction guarantees if its errors only affect low (or high bits). 96 character short "ms" strings are deprecated, they decode properly but the same hrp & data will now encode with the long checksum.
It has to be this way, if we needed the HRP to know which checksum to use, we can't protect the HRP. If we change verify rules, we break backwards compatibility.
def bech32_decode(bech):
"""Validate a Bech32/Bech32m string, and determine HRP and data."""We must do the equivalent:
def codex32_decode(codex):
"""Validate a codex32/Long codex32 string, and determine HRP and data."""The decoder must be ignorant of HRP, because that's the point of it, to determine it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can probably write this code out if you want, but my thoughts are we should have codex32_decode, an independent long_codex32_decode and an ms_decode that can call both of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense! Bech32 has an encode/decode for the format and then a separate encode/decode function for segwit addresses.
However they do encode/decode both Bech32/Bech32m checksums at once.
We need codex32_encode and codex32_decode function to handle both checksums. That has to be format level, not application level in order to detect/correct HRP errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below is untested code that is approximately what I'm thinking
def bech32_hrp_expand(s):
return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s]
CODEX32_CONST = 0x10ce0795c2fd1e62a
def codex32_polymod(residue, values):
if len(values) > 93:
return False
GEN = [
0x19dc500ce73fde210,
0x1bfae00def77fe529,
0x1fbd920fffe7bee52,
0x1739640bdeee3fdad,
0x07729a039cfc75f5a,
]
for v in values:
b = (residue >> 60)
residue = (residue & 0x0fffffffffffffff) << 5 ^ v
for i in range(5):
residue ^= GEN[i] if ((b >> i) & 1) else 0
return residue
CODEX32_LONG_CONST = 0x43381e570bf4798ab26
def codex32_long_polymod(residue, values):
if len(values) > 1023:
return False
GEN = [
0x3d59d273535ea62d897,
0x7a9becb6361c6c51507,
0x543f9b7e6c38d8a2a0e,
0x0c577eaeccf1990d13c,
0x1887f74f8dc71b10651,
]
for v in values:
b = (residue >> 70)
residue = (residue & 0x3fffffffffffffffff) << 5 ^ v
for i in range(5):
residue ^= GEN[i] if ((b >> i) & 1) else 0
return residue
def codex32_verify_checksum(hrp, data):
return codex32_polymod(1, bech32_hrp_expand(hrp) + data) == CODEX32_CONST
def codex32_verify_long_checksum(hrp, data):
return codex32_long_polymod(1, bech32_hrp_expand(hrp) + data) == CODEX32_LONG_CONST
def codex32_create_checksum(hrp, data):
polymod = codex32_polymod(1, bech32_hrp_expand(hrp) + data + [0] * 13)
if polymod:
polymod = polymod ^ MS32_CONST
return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]
return False
def codex32_create_long_checksum(hrp, data):
polymod = codex32_long_polymod(1, bech32_hrp_expand(hrp) + data + [0] * 15)
if polymod:
polymod = polymod ^ MS32_LONG_CONST
return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)]
return False
def ms32_verify_checksum(data):
if len(data) >= 96:
return codex32_verify_long_checksum("ms", data)
return codex32_polymod(codex32_polymod(1, bech32_hrp_expand("ms")), data) == CODEX32_CONST
def ms32_create_checksum(data):
if len(data) > 80:
return codex32_create_long_checksum("ms", data)
polymod = codex32_polymod(codex32_polymod(1, bech32_hrp_expand("ms")), data + [0] * 13)
polymod = polymod ^ CODEX32_CONST
return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]
As you can see, I think it is up to the particular application to handle switching between the long codex32 format and the regular codex32 format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't assume "ms" to know which checksum to verify.
If there's an hrp substitution error, we need to know which checksum is being used to detect/correct it, but if which checksum depends on which hrp application, instead of just codex32 string length, then we're stuck.
That made our length 96 exception a bug, as how can a decoder know this rule applies if it can't detect the integrity of what determines its applicability?
We need a codex32_decode function that if it validates it has the correct HRP or more than 8 errors, so applications can't choose their checksum, the format defines which to use.
Like BIP-0173's HRP detection assumption, our error correction guarantee only applies to lower (or upper) 5 bits of HRP characters. As the swaps that produce an upper bit change are very unlikely. But we can guarantee to correct 2 "double" errors.
| def ms32_create_checksum(data): | ||
| def codex32_create_checksum(hrp, data): | ||
| values = bech32_hrp_expand(hrp) + data | ||
| if len(data) > 80: # See Long codex32 Strings |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
|
No.
…On Thu, Nov 27, 2025, 2:20 AM Ben Westgate ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In bip-0093.mediawiki
<#2040 (comment)>:
> +def codex32_verify_checksum(hrp, data):
if len(data) >= 96: # See Long codex32 Strings
- return ms32_verify_long_checksum(data)
+ return codex32_verify_long_checksum(bech32_hrp_expand(hrp) + data)
if len(data) <= 93:
- return ms32_polymod(data) == MS32_CONST
+ return codex32_polymod(bech32_hrp_expand(hrp) + data) == CODEX32_CONST
return False
Needs to become now to:
def codex32_verify_checksum(hrp, data):
combined = bech32_hrp_expand(hrp) + data
if len(combined) >= 96:
return codex32_verify_long_checksum(combined)
if len(combined) <= 93:
return codex32_polymod(combined) == CODEX32_CONST
return False
Missing:
- the thorny zero length "ms" rule.
- the check in codex32_decode() for the upper long codex32 length
limit.
Because of this new max length rule rule we have the curious situation
where valid "long codex32" strings can actually be shorter overall (and in
data part characters) than regular codex32.
May want to rename that format any thoughts?
Ex:
"long" codex32 format: 10 hrp characters + 1 + 6 header characters + 54
payload characters + 15 checksum characters = 86
codex32 format: "ms" hrp characters + 1 + 6 header characters + 74 payload
characters + 13 checksum characters = 96
—
Reply to this email directly, view it on GitHub
<#2040 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BNFR33IQPDGIR4ICII5KOVT362Q2BAVCNFSM6AAAAACM4BCWD6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMJTHAYTQNRYGE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
| if len(codex[pos+1:]) < 94: | ||
| checksum_len = 13 | ||
| max_length = 96 if hrp == "ms" else 94 | ||
| elif 95 < len(codex[pos+1:]) < 1024: | ||
| checksum_len = 15 | ||
| max_length = 1026 if hrp == "ms" else 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now these lines are wrong as well.
Special rule for "ms" is ugly in code.
Needs change to:
def codex32_decode(codex):
if ((any(ord(x) < 33 or ord(x) > 126 for x in codex)) or
(codex.lower() != codex and codex.upper() != codex)):
return None, None
codex = codex.lower()
pos = codex.rfind('1')
hrp = codex[:pos]
hrp_len = len(hrp) if hrp != "ms" else 0
if len(codex) + hrp_len < 97:
checksum_len = 13
elif 98 < len(codex) + hrp_len < 1027:
checksum_len = 15
else:
return None, None
if pos < 1 or len(codex[pos+1:]) < 6 + checksum_len or len(codex) > 1024:
return None, None
if not all(x in CHARSET for x in codex[pos+1:]):
return None, None
if not codex[pos+1].isdigit():
return None, None
if codex[pos+1] == "0" and codex[pos+6] != "s":
return None, None
data = [CHARSET.index(x) for x in codex[pos+1:]]
if not codex32_verify_checksum(hrp, data):
return None, None
return hrp, data[:-checksum_len]| polymod = codex32_long_polymod(values + [0] * 15) ^ CODEX32_LONG_CONST | ||
| return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)] | ||
| </source> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should mention its maximum length for the correction guarantees in a sentence here, similar to what is in the checksum section. Otherwise 1024 is a magic number in the reference snippets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I have:
- compatibility encoding for BIP-39, allowing to Shamir split mnemonic & extended private key wallets
- HRP:
cc - encodes chaincode + private key of BIP-32 master extended key (64 bytes)
cc10zcvjs5klr60nyt8usd553sge7r5glcy2ztwfv2d2smmcs7m3mq6dduwavccnjzjchlkffjfx8p3cjjx64q9vkxdt8q9qzuu3s8jfgjysa5pc5nezf2qkfhqpfwf
- HRP:
I only have one HRP, I do not differentiate between testnet/mainnet, even tho I use extended key data, I'm also using it for menmonics, where I first generate master BIP-32 key and then use those values for the codex32 secret share. DO you consider the lack of testnet/mainnet separation an issue?
Your addition of HRP into checksum definitely broke my tests wrt checksum for cc hrp secrets (not an issue, I haven't released yet - but I'm planning to in few weeks)
My HWW implementation is pretty much in accordance with https://github.com/BlockstreamResearch/codex32/blob/master/docs/wallets.md . My implementation is not ECW. I even provide generate support for secret share S. I only allow to generate 128 & 256 bit MS secrets (but allow to import also 512 bit). In short:
- TRNG 256 entropy bits
- r = sha256(sha265(entropy))
- x = r[:byte_len]
- x is new master secret, and default ID is 20 MSB from master XFP (but user can change if he wishes to)
What are the chances of this patch-set to be accepted? Is this spec stable enough to start releasing it ?
|
How I generate non-secret shares:
|
"bc" and "tb" for Bech32 addresses were an upgrade in human-readable prefix from the base58 encoding. I consider it a regression if you use less characters to encode a human-readable prefix than the base58 extended key format did. "xpriv" is an option here.
Does your format need a 65th byte for the public key that is zero when encoding private keys? There are many advantages to the strings needing disambiguation having the same byte length.
Yes, this is a huge regression from the current bip32 extended key format we want to upgrade. Mostly that I can't tell by looking at the descriptor if it's for real funds or not.
HRP was always in the checksum, it just was pre-computed for "ms" so the checksums for other HRP were wrong. I noticed when I tried to validate the CLN HSM secret examples in my python-codex32 package.
@roconnor has a PR in codex32 that does ECC you could test.
I have a codex32 PR to update wallets.md guidance for generation, you may see something useful, especially in the HWW case. In short:
You can and probably should use the entropy bits directly. If they lack entropy, sha256d is an illusion of security.
It will need wider community review than us. But there's comments by P. Wuille as far back as 2020 stating a 4 error correcting bech32 encoding of extended keys is needed. So high acceptance changes once it's correct and shiney. This spec PR will not change anything that affects your encoding of ~78 bytes or whatever an extended key has. We're mostly debating behavior at the limit between short and long checksums. Yours unambiguously use long codex32. |
It is unsafe to child derive shares from the secret they recover. They should be independently random. When part of the secret is compromised and an attacker tries to brute force the rest: the dependent relation between the secret and share A allows an attacker with k-1 shares or share A to check his guesses against this. This is far faster than checking an address. |
I see now...
I do not want to use randomness here, as I want to split existing secret, and I require the "split" to be deterministic, so that if user is splitting the exact same secret, uses same hrp, same threshold, same id, and same number of shares - application always produces the exact same shares. I could add an option to to choose, if random, or deterministic split, but deterministic is a hard requirement. ...also it is 5 hardened derivation steps plus hmac_sha512
there are plenty other brute-force options if attacker has part of secret, I do not consider this scenario of yours to be something I should optimize for
I do not encode extended key (or full extended key), I only encode chaincode + privkey, without any other data as I just want to be able to restore naked xpriv from it, without any more meta extended keys carry. As I use it for both mnemonics and extended keys. That is why I dismissed the idea of doing testnet/mainnet differentiation as I consider my 64bytes to be the "secret" |
The best you could do here if you insist, is perform a KDF on the secret data to harden it before deriving child shares from that derived key. But it still reduces security from information theoretic to computational.
Still significantly faster than address checking. The EC mult is the bottleneck for address checking is what Andrew told me.
My point is your standard should be harder to exploit than all other options or we lose security for nothing. Simply deriving child shares from an argon2id or scrypt derived key is probably enough protection.
It seems better to encode the recovery words and wordlist with a |
|
I'm this close to throwing in the towel. BIP-93's design was never intended to be generalized to arbitrary HRP, and it shows. If people want to reuse our polynomial for their own schemes, then more power to them. They can make their own BIP. |

Summary of Changes:
Describe codex32 format for arbitrary human-readable parts not just "ms", specify master seed encoding standard, add new test vectors and enhance readability. This makes the document more like BIP-0173: proposing an encoding "codex32", then defining a standard for something using it.
See discussion on #2023 (comment).
Spec:
hrpTest Vectors:
ms32_verify_checksumfunction