Skip to content

Conversation

@BenWestgate
Copy link

@BenWestgate BenWestgate commented Sep 7, 2025

This allows wallets to derive codex32 secrets and codex32 shares from BIP-0032 master keys.

Summary of changes

Rationale

  • Mirrors the existing BIP-85 application for BIP-39.
  • Codex32 offers error correction, hand verification, identifiers, and secret sharing improvements vs BIP-39.
  • Deterministic generation produces auditable backups by avoiding reliance on local RNG, helping users who distrust device entropy.

Specification

  • Adds Application 93' to BIP-0085 using derivation path:
m/83696968'/93'/{header}'/{byte_len}'/{index}''

Tests
Reference tests and new vectors will be included in the reference bipsea implementation:
BenWestgate/bipsea@master...BenWestgate:bipsea:master

Mailing List
Discussion: https://groups.google.com/g/bitcoindev/c/--lHTAtq0Qc

Status
Ready for conceptual and approach review. This change is additive and does not modify existing BIP-85 behavior.

@jonatack jonatack added Proposed BIP modification Pending acceptance This BIP modification requires sign-off by the champion of the BIP being modified labels Sep 8, 2025
@BenWestgate BenWestgate marked this pull request as draft September 8, 2025 00:17
@BenWestgate BenWestgate marked this pull request as ready for review September 8, 2025 00:26
@BenWestgate BenWestgate changed the title Add Codex32 (BIP-0093) as application 93' to BIP-0085 BIP85: Add Codex32 application 93' Sep 9, 2025
@BenWestgate BenWestgate changed the title BIP85: Add Codex32 application 93' BIP85: Add Codex32 as application 93' Sep 9, 2025
@akarve
Copy link
Contributor

akarve commented Sep 10, 2025

Documenting recent discussions:
@BenWestgate Please see my mailing list comments to your thread with suggestions and simplifications (path, byte extraction, idx, etc.). Regarding 1.4.0 the main thing is we want to warrant full compatibility (all features) up to the prior version and (just saw you reopened 68) a PR to the 1.3.0 client is probably the easiest way to achieve that. Lmk if anything is unclear.

@BenWestgate
Copy link
Author

BenWestgate commented Sep 12, 2025

Documenting recent discussions: @BenWestgate Please see my mailing list comments to your thread with suggestions and simplifications (path, byte extraction, idx, etc.). Regarding 1.4.0 the main thing is we want to warrant full compatibility (all features) up to the prior version and (just saw you reopened 68) a PR to the 1.3.0 client is probably the easiest way to achieve that. Lmk if anything is unclear.

It seems you'd like to consolidate some of the paths. There's a few ways to do this, if you have a favorite or one that immediately stands out as obvious let me know.

I'm thinking the identifier could be the bech32 encoding of the bip85 index, as the purpose of incrementing the index is to get new seeds, and BIP93 says "...the identifier SHOULD be distinct for all master seeds the user may need to disambiguate."

index = 0 -> identifier = qqqq, index = 1 -> identifier qqqp, and so on. A particular identifier could be selected by converting it to an integer {index} once index reaches 32^4, it can fall back to the default BIP-0032 fingerprint or roll over.

On byte extraction: I agree we should draw byte_length bytes and pad to a multiple of 5 bits with a CRC. The polynomials (1 << crc_len) | 3 is optimal for 1-4 bits.

If we output share indices still can use the current read one byte at a time method.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @scgbckbone (who has been active on BIP85 review) for feedback.

@scgbckbone
Copy link
Contributor

Seems to me this is well over the BIP-85 application scope. As I understand it, BIP85 generates "a thing" from "a thing". Your application is generating "multiple things" from "a thing".

Why are you generating multiple initial shares via BIP85 ?

What I imagined BIP85 application should looks like after reading BIP93:

  1. way to generate secret share s from BIP-32 root seed (so that you can load other wallets with derived entropy). Something like this: m/83696968'/93'/{b93_index mapped to int -> s in this case}'/{byte_length}'/{index}'
  2. way to generate any non-secret share from BIP32 root seed. This, as per rationale, would allow users to generate 2nd (and only 2nd) share deterministically via BIP85, and not via RNG. All other shares should be derived according to BIP93 via interpolation. m/83696968'/93'/{b93_index mapped to int -> not s in this case}'/{byte_length}'/{index}'

** maybe even threshold should be part of the BIP32 derivation path, BUT I think not as it has no effect to the actual secret generated (it only affects checksum)

Assuming I'm not wrong in my "specualtion", why not just use m/83696968'/128169'/{num_bytes}'/{index}' to generate deterministic bytes from BIP-32 root seed for any share ?

@jonatack jonatack added the PR Author action required Needs updates, has unaddressed review comments, or is otherwise waiting for PR author label Oct 14, 2025
@BenWestgate
Copy link
Author

BenWestgate commented Oct 19, 2025

Thank you for great feedback @scgbckbone. I'll explain the rationale behind your questions first.

...BIP85 generates "a thing" from "a thing".

True, but that thing can be structured.
For example, BIP39 derives an entire mnemonic, not one word at a time.
Here, the "thing" is a complete codex32 backup as there's no recoverable seed without at least {threshold} shares.

Why are you generating multiple initial shares via BIP85 ?

Determinism.
We want to eliminate ambiguity about which initial share indices were derived by BIP85 to make BIP85 child seed recovery easier. Example:

bip85 = Bip85(master_root_xprv)
# 1. generate secret share "s" from root seed
secret1 = bip85.derive_codex32(t=3, share_idx='s')
# 2. generate `k` any non-"s" shares from root seed, interpolate according to BIP93
secret2 = Codex32String.interpolate_at(
    [
      bip85.derive_codex32(k=3, share_idx='a'),
      bip85.derive_codex32(k=3, share_idx='c),
      bip85.derive_codex32(k=3, share_idx='d'),
    ],
    target="s"
)
secret3 = Codex32String.recover(
    [
        bip85.derive_codex32(k=3, share_idx='x'),
        bip85.derive_codex32(k=3, share_idx='y),
        bip85.derive_codex32(k=3, share_idx='z'),
    ],
    target="s"
)
derived_secrets = [secret1, secret2, secret3]
identifiers = set()
master_seeds = set()
for secret in derived_secrets:
    identifiers.add(secret.identifier)
    master_seeds.add(secret.data)

if len(identifiers) < len(master_seeds):
    raise Bip93Quote("Identifier SHOULD be distinct for every master seed the user may need to disambiguate")

For the same BIP85 root key, each {threshold} set of initial {share_idx} BIP85 derived shares recovers a different secret; and the 's' derivation yet another. That's a bad property: these codex32 sets share the same header ms13<identifier> making them hard to disambiguate. Mismatched sets recover wrong seeds.

** maybe even threshold should be part of the BIP32 derivation path, BUT I think not as it has no effect to the actual secret generated (it only affects checksum)

For secret sharing, the {threshold} must be in the derivation path.
Otherwise ms12testa... and ms13testa... share entropy payloads even though they're distinct backup sets, a security vulnerability if both are used.

For unshared secrets, the threshold has no effect, so it'd be ideal to ignore it when not secret sharing. That way, knowing the BIP85 index and root key uniquely identifies the seed, regardless of threshold, consistent with other BIP85 applications.

...why not just use m/83696968'/128169'/{num_bytes}'/{index}' to generate deterministic bytes from BIP-32 root seed for any share ?

Then why not use that for BIP39 or any other application too?
Let users convert deterministic bytes into mnemonics or codex32 strings as they wish.
The point of a BIP85 application is to standardize how that entropy is consumed into a specific deterministic format.

Based on feedback from you and @akarve
Simplified proposal:
Derivation:
matlab m/83696968'/93'/{header}'/{byte_length}'/{index}'

  • where {header} is an int encoding of <hrp>, <hrp>1<k>, or <hrp>1<k><identifier> (TBD).

Simplifications:

  • {share_idx} and {num_shares} can be eliminated
  • {identifier} can be implicit, but if user-defined, {index} should feed into it to keep output identifiers distinct per master seed
  • "Existing master seed" derivation rule is removed, we only generate fresh seeds.
    • Users can discard an initial share and interpolate if they have an existing master seed they wish to share.
  • BIP93 interpolation and relabeling identifiers left to users.
  • Default identifier = BIP32 fingerprint of derived seed.

Example of the simplified form:

bip85 = Bip85(master_root_xprv)
# 1. generate k=0 secret share "s" from root seed
secret1 = bip85.derive_codex32(k=0)
# 2. generate `k` fixed non-"s" shares from root seed, interpolate according to BIP93
shares = bip85.derive_codex32(k=2)
secret2 = Codex32String.interpolate_at(shares, target="s")
shares = bip85.derive_codex32(k=3)
secret3 = Codex32String.interpolate_at(shares, target="s")
derived_secrets = [secret1, secret2, secret3]
identifiers = set()
master_seeds = set()
for secret in derived_secrets:
    identifiers.add(secret.identifier)
    master_seeds.add(secret.data)

assert len(identifiers) == len(master_seeds)
# header is distinct for each master seed the user may need to disambiguate

assert len(master_seeds) == 1
# same master seed for same {index}, regardless of k

Seems over the BIP-85 scope.

The version brings it back within scope:

  • k=0: derive 1 codex32 secret.
  • k=2: derive 2 codex32 shares ('a' and 'c') → recover same secret.
  • k=3: generate 3 codex32 shares ('a', 'c' and, 'd') → recover same secret.
    Identifier defaults to derived seed’s BIP32 fingerprint.
    Incrementing {index} yields new seeds, with new identifiers automatically.

@jonatack jonatack removed the PR Author action required Needs updates, has unaddressed review comments, or is otherwise waiting for PR author label Oct 19, 2025
@scgbckbone
Copy link
Contributor

Then why not use that for BIP39 or any other application too?
Let users convert deterministic bytes into mnemonics or codex32 strings as they wish.
The point of a BIP85 application is to standardize how that entropy is consumed into a specific deterministic format.

agreed, rest my case here...

For example, BIP39 derives an entire mnemonic, not one word at a time.

this is bad comparison, as 12/24 words represent encoding of 16/32 bytes of entropy. While your approach creates multiple shares. Same as if I would create multiple 12 words seeds from 16 bytes of entropy.

this is imo ok (using code from your snippets whithout ever running it or reviewing it). My understanding is that each line in below snippet, generates just one share?

secret_share = bip85.derive_codex32(t=3, share_idx='s')
share_a = bip85.derive_codex32(t=3, share_idx='a'),
share_c = bip85.derive_codex32(t=3, share_idx='c'),
share_d = bip85.derive_codex32(t=3, share_idx='d'),

with what I have issue is this, where you just generating multiple shares (somehow):

# 2. generate `k` fixed non-"s" shares from root seed, interpolate according to BIP93
shares = bip85.derive_codex32(k=2)

The version brings it back within scope:

only k=0 is within the scope (imho)

Don't get me wrong, I'm not intending to block this BIP update. Updated version is much better. I'm only trying to figure out why is this needed & whether there is any advantage in what you're doing vs. what I'm doing. Here is my pseudo-code, to try to prove the point that nothing else than simple "one share generation" is needed here & rest can be left to BIP-93 interpolation:

CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
secret_share = <32 bytes secret loaded in HWW>
id = "cash"
threshold = 3
num_shares = 4
# use BIP85 to deterministically generate secret share "L" (or any other, up to specific wallet implementation)
share_l = bip85(secret_share).derive_codex32(t=threshold, id=id, share_idx='l')
shares = [sahre_l]
for i in range(num_shares - 1):  # -1 as share 'a' was already generated
    shares.append(bip93.interpolate([secret_share, share_l], CHARSET[i]))   

Above pseudo-code always generate the same shares.

@BenWestgate
Copy link
Author

While your approach creates multiple shares. Same as if I would create multiple 12 words seeds from 16 bytes of entropy.

I would not say these are quite the same. Both the initial k shares and 12 word seeds need entropy to create but a set of k shares does represent a single seed. < k can not recover a seed or define a specific backup set of shares. All other bip85 applications recover a specific secret at each bip85 index so that it can be recovered later. If the app generates "loose" shares, this is not possible, bip85 does not know which initial shares generated, if at all, a master seed. And were that desired, users can discard some output.

only k=0 is within scope (imho)

This can't generate shares because {share_idx} must be "S" for k=0. Of course, I like the simplicity of this small scope but it leaves users and wallets tasked with generating the randomness needed to securely secret share that BIP85 derived codex32 secret.

If the purpose of BIP85 is deterministic randomness from bip32 keychains, why suddenly stop short of providing the additional randomness needed to do SSS for an SSS-aware format?

Path: m/83696968'/93'/{hrp}'/{byte_length}'/{index}'
We could drop the bip93 interpolation and directly encode threshold strings at fixed indices, using default bip32 fingerprint identifier. BIP85 app output would be:
k=0: a codex32 secret.
k=2: same codex32 secret and share "A".
k=3: same codex32 secret, a new share "A", and share "C".
k=4: same codex32 secret, new shares "A" and "C", and share "D".
etc...

Now each {bip85 index} derives a single seed as with bip39 but we also give deterministic entropy for secret sharing. Leaving wallets to interpolate any remaining shares.

My understanding is that each line in below snippet, generates just one share?

Yes, it was an example of part 2 of your proposal:

  1. way to generate any non-secret share from BIP32 root seed. This ... would allow users to generate share[s] deterministically via BIP85, and not via RNG.

The disadvantages I saw with it for the same {bip85 index}:

  1. Different arbitrary {share_idx} combinations of bip85 derived shares recover different seeds, which is unexpected bip85 behavior and less interoperable.
    • Solution: fix the indices, only output:
      1. "S", ["A", "C"] or, ["A", "C", "D"] for k=0, k=2 and k=3 respectively. Or
      2. "S", ["S", "A"] or, ["S", "A", "C"], etc
    • Whichever is preferred, a is more usable by humans. Avoids mistakenly writing the secret. b avoids our bip85 app needing interpolation logic as it outputs direct entropy encodings.
  2. Different thresholds recover different seeds.
    • Fix: Remove k from the derivation path and always encode seed as the first {bytes_length} from the DRNG
      • Or use HMAC(derived_entropy, identifier) and truncate.
  3. {threshold} doesn't affect share payloads so they might be reused across backups.
    • Seeds may be "reshared" but shares should be fresh.
    • e.g. a 2-of-n and 3-of-n and accidentally use the same {bip85 index} then both backups have the same share 'A' payload.
    • Fix: After generating the seed, reseed the DRNG with HMAC(derived_entropy, k) and then derive k - 1 independent share payloads with {bytes_length} reads from the DRNG.
      • Could also reseed with HMAC(derived_entropy, k + identifier) which would support resharing an existing master seed at the same threshold with a unique identifier.
      • Or use HMAC(derived_entropy, k + identifier + share_idx) directly and truncate for each share payload.

I'm only trying to figure out why is this needed...

To support deterministic generation of initial strings for users/wallets intending to do SSS, we should output a threshold quantity of strings to avoid these 3 interoperability and recovery problems.

However this isn't necessary for a minimum viable PR, as k=0 is useful on its own. And I can break this into two PRs one for codex32 secrets and another for k >= 2 which output a codex32 secret and k-1 codex32 shares.

How should I proceed?

@jonatack
Copy link
Member

jonatack commented Nov 4, 2025

@akarve thoughts here? (thanks!)

@akarve
Copy link
Contributor

akarve commented Nov 4, 2025

Yeah. Not to slow down the innovation here (and in @3rdIteration 's PRs) but my thinking is for me take on implementing both applications in the current reference implementation as Python protocols. My belief is that if we can come up with a standard duck typing interface for all BIP-85 applications that this—as yet non-existent—abstraction will stand the test of time. Of course anyone else in this thread is free to propose the shape of the protocol and even implement it without me. My experience is that we will need full unit tests and such for the protocol to be hard enough to stand the test of time. I'm volunteering to complete said protocol + implementation this quarter but if anyone wants to go first down this path they're welcome.

Protocol designs can go in this thread. The ultimate goal is we have a standard protocol/interface for BIP-85 applications and a standard "graph" that they all go through and boom out come the entropy products. In this way all applications use the same protocol and same core logic. This would benefit current and future applications as well as benefit PR-clearing speed (after the initial investment) once the protocol is in place. Said protocol will also resolve the "one-product/two-product" style debates happening in this PR because the quacking of the duck types will resolve such points as how many things turn into how many other things. I hope that makes sense.

@BenWestgate
Copy link
Author

BenWestgate commented Nov 6, 2025

Protocol designs can go in this thread.

Each app should define a function to take parameters and derived entropy, truncate as needed (or seed a DNRG) and format it.

Each application SHOULD use up to the required number of bits necessary for their operation, and truncate the rest.

def entropy_to_output(entropy, parameters):
    # Logic to produce output based on provided entropy and parameters
    return formatted_output

Derivation paths have a semantic application number, end with an /{index}' to produce unique secrets. If there is a length quantity for the output that SHOULD precede {index} in the derivation path.

Many applications also need to define a function to generate the BIP85 derivation path based on their parameters.

def parameters_to_path(parameters, index):
    # Logic to produce the BIP85 derivation path
    return bip85_derivation_path

I updated this PR to generate single codex32 strings per BIP85 {index}.

Added detailed explanation of BIP93 codex32 derivation path, including examples for generating codex32 secrets and shares.
Update BIP proposal text with new design
@BenWestgate
Copy link
Author

BenWestgate commented Nov 11, 2025

What I imagined BIP85 application should looks like after reading BIP93:

1. way to generate secret share `s` from BIP-32 root seed (so that you can load other wallets with derived entropy). Something like this: m/83696968'/93'/{b93_index mapped to int -> `s` in this case}'/{byte_length}'/{index}'

2. way to generate any non-secret share from BIP32 root seed. This, as per rationale, would allow users to generate 2nd (and only 2nd) share deterministically via BIP85, and not via RNG. All other shares should be derived according to BIP93 via interpolation. m/83696968'/93'/{b93_index mapped to int -> `not s` in this case}'/{byte_length}'/{index}'

** maybe even threshold should be part of the BIP32 derivation path, BUT I think not as it has no effect to the actual secret generated (it only affects checksum)

After Heavy thinking, I concur the best solution is to generate single codex32 strings in a manner very similar to what @scgbckbone proposed.

I have rewritten the BIP text to match this design, it is a fraction of the original's length so that's another win.

I look forward to updated feedback and will be re-writing the reference implementation in the coming weeks.

The use-case that sold me on single share outputs was:

I have two trusted friends Alice and Bob. I want a master seed neither friend knows (unless they collude) that I can recover by asking Alice and Bob for derived shares. So Alice derives my threshold=2 share A with identifier "help", and Bob derives my threshold=2 share C with identifier "help" and I can recover the secret MS12HELPS..., and derive other shares such as "M", "Y", etc that will allow me to recover my seed without Alice or Bob. But even if I lose every share, I can still recover in the same manner as my seed was created.

By limiting the shares to the first threshold indicies and including the identifier in share entropy derivation, it avoids the issue of there being multiple seeds recoverable at a given threshold and identifier, at least from a single bip85 root key.

Copy link
Contributor

@scgbckbone scgbckbone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better!

few more questions/comments:

what are CANONICAL_INDICES ? it seems that you limit maximum number of shares to 10 by using them ?

The identifier SHOULD default to the 4 left-most characters of the Bech32-encoded BIP-0032 fingerprint derived from the master seed.

why? imo this should be left out from this spec and left to specific implementations to decide

Users should enter a unique identifier instead of incrementing {index} as different share sets SHOULD have unique identifiers.

I agree here & thinking that maybe we should remove index completely...

What do you think about below derivation path calculation:

CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
VALID_HRP = {
    "ms": 0,
    "cl": 1,
}

def pack_header(threshold: int, identifier: str, share_idx: str) -> int:
    id_a = CHARSET.index(identifier[0])
    id_b = CHARSET.index(identifier[1])
    id_c = CHARSET.index(identifier[2])
    id_d = CHARSET.index(identifier[3])
    idx = CHARSET.index(share_idx)
    return (threshold << 25) | (id_a << 20) | (id_b << 15) | (id_c << 10) | (id_d << 5) | idx

def unpack_header(n: int):
    threshold = (n >> 25) & 31
    id_a = (n >> 20) & 31
    id_b = (n >> 15) & 31
    id_c = (n >> 10) & 31
    id_d = (n >> 5)  & 31
    idx = n & 31

    identifier = ""
    for i in [id_a, id_b, id_c, id_d]:
        identifier += CHARSET[i]

    return threshold, identifier, CHARSET[idx]


def make_path(hrp, byte_len, threshold, identifier, share_idx, ):
    header = pack_header(threshold, identifier, share_idx)
    pth = f"m/83696968h/93h/{VALID_HRP[hrp]}h/{byte_len}h/{header}h"
    return pth

b85_path = make_path("ms", 16, 9, "llll", "l")
print(b85_path)
# m/83696968h/93h/0h/16h/335544319h
print(unpack_header(int(b85_path.split("/")[-1][:-1])))
# (9, 'llll', 'l')
  • removed idx (even tho I have second thoughts, as it is now very different than other BIP-85 apps, but maybe no care as it is already very different as with most other app you only choose index and nothing else. Here, on the other hand, there is plenty to specify)
  • added HRP into the path directly - useful imo - users can see immediately what is it for

@BenWestgate
Copy link
Author

BenWestgate commented Nov 22, 2025

Quoting BIP93: For a fresh master seed to justify my answers:

...the user generates random initial shares, as follows:

  • Choose a bitsize, between 128 and 512, which must be a multiple of 8.
  • Choose a threshold value k between 2 and 9, inclusive
  • Choose a 4 bech32 character identifier
    • We do not define how to choose the identifier, beyond noting that it SHOULD be distinct for every master seed the user may need to disambiguate.
  • k many times, generate a random share by:
    • Take the next available letter from the bech32 alphabet, in alphabetical order, as a, c, d, ..., to be the share index
  • Set the first nine characters to be the prefix ms1, the threshold value k, the 4-character identifier, and then the share index
  • ...

what are CANONICAL_INDICES ?

I renamed that constant to:
IDX_ORDER = "sacdefghjklmnpqrtuvwxyz023456789" # Canonical BIP93 share indices alphabetical order

It is the secret index "s" followed by the 31 share indices (excluding 's') alphabetically sorted. Prepending "s" makes IDX_ORDER.index("s") == 0, simplifying checks for the unshared secret index.

it seems that you limit maximum number of shares to 10 by using them ?

BIP93 prescribes using the first k alphabetical share indices for "random initial shares" (our bip85 entropy application).
k has max value 9. Including unshared secrets which use share_idx = "s", leaves 10 valid.

BIP85 deriving share indices beyond this initial k makes recovery ambiguous where knowing the common share header of a backup (hrp, threshold, identifier), root key and bip85 index is no longer sufficient to recover a unique secret and violates both specs.

Example:
For bip85_index = 0, prefix ms12cash, the initial shares bip85 derived with share_idx = "a" and share_idx = "c" would bip93 recover a different secret with prefix ms12cashs than initial shares with share_idx = "y" and share_idx = "z" recover.

Breaking BIP85's {index} customary rule:

The BIP85 index is a number used to derive unique child secrets from a root key. Each index generates one unique secret, allowing for up to 2^31 secrets based on the same root key.

And BIP93's identifier rule:

...[identifier] SHOULD be distinct for every master seed...

If share_idx == "s", then k MUST be 0 for the same reason. At the same child index and identifier, this presents two secrets: the directly bip85 derived unshared secret at threshold = k, share_idx = "s" and the recovered secret when k threshold = k shares are first bip85 derived then bip93 interpolated to "s".

The additional 31-k shares beyond the first k may be derived using BIP93 interpolation, which is outside spec as it needs no fresh randomness. Outputting interpolated shares would be like our XPRV application producing an extended private key encoding of HMAC-SHA512("Bitcoin seed", S) where S is the payload of our WIF application. We don't do that, we always generate fresh randomness or nothing at all. And I've proven the problem with generating fresh randomness for shares beyond the first k alphabetical indices.

What do you think about below derivation path calculation

My first thought was "OH, that's good!", I like this derivation path, provided we add the share_idx restrictions above.

However, Are you sure you want to eliminate the bip85 {index}?

All deployed BIP85 applications use this derivation path feature and the reference implementation expects one:

As with all applications, you can change the child index from it's default of zero to get a fresh, repeatable secret.

index, 0 to 2³¹ - 1 for millions of unique passwords

def derive_cli(application, number, index, special, xprv, to):

@click.option(
    "-i",
    "--index",
    type=click.IntRange(0, 2**31 - 1),
    default=0,
    help="Child index. Increment for fresh secrets.",

It's recommended headers be distinct for unique seeds, but not required as with no {child index} derivation.

why? imo this should be left out from this spec and left to specific implementations to decide

Identifier needs an assignment to encode the string so it may as well be the most useful deterministic data.
The codex32 authors like this as as a default for "electronic implementations" (which bip85 is):
BlockstreamResearch/codex32#54 (comment)

If the user can specify and we keep a child {index}, we break the "SHOULD be distinct for every seed" rule. I resorted to the fingerprint default based on an assumption bip85 applications MUST have a child index for "millions of unique" "fresh secrets" as per quotes above.

I dislike the least significant bits of {index}' being share_idx as when I think of the phrase "Child index. Increment for fresh secrets" that's not happening if say, share_idx = "s" changes to "3" and my secret output becomes a share output. If it must go in the last child index derivation level, I prefer share_idx and k as the most significant bits in the {index} serialization.

Retaining child index and using a default fingerprint identifier for unshared secrets makes them most like other bip85 apps. It's not required though, we could derive unshared secrets like shares by serializing the identifier into the child index, subject to the restriction k must be 0 if share_idx = s.

This version always requires the specifying the identifier even for secrets:

IDX_ORDER = "sacdefghjk"
VALID_HRP = {
    "ms": 0,
    "cl": 1
}

def bip93_parameters_to_path(hrp, threshold=0, identifier="", share_idx, byte_len, index=0):
    if threshold and share_idx not in IDX_ORDER[1:threshold+1]:
        raise InvalidShareIndex()
    if not identifier:
        raise MissingIdentifier()
    if index > 31:
        raise ValueError("maximum index 31 allowed for codex32 strings")
    index = (
                        + threshold * 2 // 2 + IDX_ORDER.index(share_idx) << 25
                        + index << 20
                        + int.from_bytes(bech32_decode(identifier), 'big')
                      )
    return f"m/83696968'/93'/{VALID_HRP[hrp]}'/{byte_len}'/{index}'"

This allows 32 child index values which combined with a million+ identifiers meets the existing "millions" spec. Incrementing this level changes the identifier as expected by BIP93 for a unique secret.

I still prefer combining hrp, k and share_idx in the first derivation level, and using the BIP32 fingerprint identifier for share_idx = "s" so we have all 2**31 - 1 child indices for fresh secrets and only codex32 shares break the established pattern (by being restricted to 2 ^ 11 - 1 child indices, as 20-bits are allocated for the required identifier).

def bip93_parameters_to_path(hrp, k=0, identifier="", share_idx, byte_len, index=0):
    if share_idx not in IDX_ORDER[min(1, k) : k + 1]:
        raise InvalidShareIndex()
    header_path_int = VALID_HRP[hrp] * 100 + k * 10 + IDX_ORDER.index(share_idx)
    if share_idx != "s":
        if not identifier:
            raise MissingIdentifier()
        if index > 2**11 - 1:
            raise ValueError("maximum index 2047 allowed for codex32 shares")
        index = index << 20 + from_bytes(bech32_decode(identifier), 'big')
    return f"m/83696968'/93'/{header_path_int}'/{byte_len}'/{index}'"

What do you think?

@scgbckbone
Copy link
Contributor

BIP93 prescribes using the first k alphabetical share indices for "random initial shares" (our bip85 entropy application)

completely missed that one, thanks for the reminder + examples

However, Are you sure you want to eliminate the bip85 {index}?

no, I'm not. Completely unsure about it tbh. I've got second thought while writing it & even more after I published. Now I think we should just have a proper index ((2**31) -1) at the end.

Identifier needs an assignment to encode the string so it may as well be the most useful deterministic data.
The codex32 authors like this as as a default for "electronic implementations" (which bip85 is):

I like the idea of using master pubkey fingerprint, but is it even possible for bech32 charset where we do not have number 1? (it can only be used as HRP separator in codex32 BIP)

f"m/83696968'/93'/{header_int}'/{byte_len}'/{index}'"

  • like this path (imo we should not use more than 5 derivation steps)
  • as I said, I'm now more inclined to use proper BIP85 index (2**31-1) without any more data encoded in it (but unsure)
  • that leaves us with the need to encode all the res to the header byte, which would not scale as we would (in my case) only have 2 more bits for HRP (good for now but broken in case more HRPs gonna be added).

I your encoding two-way? can you get back to header used from int ?

@BenWestgate
Copy link
Author

BenWestgate commented Nov 24, 2025

Now I think we should just have a proper index ((2**31) -1) at the end.

We have index 0 through 2**32-1 for secrets (k= 0, share_idx = "s") if id = fingerprint() or if id (identifier -> int) is packed in the first derivation level with hrp, k and share_idx.

In codex32 and bip85, id and child index mean the same thing: "unique" outputs. So in various iterations, I've either set the child index index = id, or serialized them together index = index << 20 | id since one has 31 bits and the other 20. From a UX perspective it's better to choose 4 bech32 characters and decode it into the child index than to chose an integer child index that produces the desired 4 character identifier.

For identifier collision resistance, index = id is best, followed by index = index << 20 | id as the first 2^20 child indexes all have unique identifiers before it rolls over. While the fingerprint likely begins to collide (birthday problem) after ~2^10 child indexes used. But the master pubkey fingerprint is far more useful for identifying what the secret is for and harder to accidentally reuse on different child secrets. That practical benefit seems to outweigh the risk of identifier reuse in the first 2**20 child indices.

Since only hrp = "ms" secrets represent a BIP-0032 master seed with an associated master key pair and master pubkey fingerprint perhaps we drop the fingerprint default? Or (my preference) use it only in the ms10 case?

Key identifiers

Extended keys can be identified by the Hash160 of the public key, ignoring the chain code. This corresponds exactly to the data used in traditional Bitcoin addresses.

The first 32 bits of the identifier are called the key fingerprint.

hrp = "cl" secrets are a 32-byte private key so they technically don't have a key fingerprint, and at any rate this means custom per hrp identifier derivation logic.

I like the idea of using master pubkey fingerprint, but is it even possible for bech32 charset where we do not have number 1?

We can bech32-encode the 20-MSB of master pubkey fingerprint into the 4-character identifier instead of 16-bits if we mapped hex characters, which we can't since b, 1 are missing. It's easy to convert it to hex without electronics using the BIP-0173 table.

f"m/83696968'/93'/{header_int}'/{byte_len}'/{index}'"

like this path (imo we should not use more than 5 derivation steps)

Agreed, this simplifies implementations assumptions about path lengths.

that leaves us with the need to encode all the res to the header byte, which would not scale...

There are 45 k and share_idx combos so they consume 6-bits (compressed) and id 20-bits, leaving 2^5 hrp codes, 30 remaining, a narrow margin. BIP-0173 has 288 registered prefixes but codex32 will never have as many since it is for private, not public data.

as I said, I'm now more inclined to use proper BIP85 index (2**31-1) without any more data encoded in it (but unsure)

Here is an example path encoding that uses the 20-LSB of the identifier parameter as the 20-LSB of the child index derivation level, and left-shifts any optional "index" parameter 20-bits to not mangle the identifier part.

I your encoding two-way? can you get back to header used from int ?

Yes.

CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
IDX_ORDER = sorted(CHARSET, key=lambda c: (c != 's', c.isdigit(), c))  # "s" then alphabetical order
VALID_HRP = ["ms", "cl"]

def bip93_parameters_to_path(hrp, k, share_idx, byte_len, ident="", index=0):
    if k == 1 or not (0 <= k <= 9):
        raise ValueError("Invalid threshold parameter")
    if share_idx not in IDX_ORDER[min(1, k) : k + 1]:
        raise ValueError("Invalid share index")
    if len(ident) != 4:
        raise ValueError("Missing unique 4-character bech32 identifier")
    if index > 2**11 - 1:
        raise ValueError("maximum index 2047 allowed for codex32 shares")
    
    header_path_int = VALID_HRP.index(hrp) * 100 + k * 10 + IDX_ORDER.index(share_idx)
    for char in ident:
        index = index << 5 | CHARSET.find(char)

    return f"m/83696968'/93'/{header_path_int}'/{byte_len}'/{index}'"


def bip93_path_to_parameters(path=''):
    header_int, byte_len, index = [int(segment[:-1]) for segment in path.split("/")[3:]]
    hrp = VALID_HRP[header_int // 100]
    k = header_int // 10 % 10
    share_idx = IDX_ORDER[header_int % 10]
    ident = "".join([CHARSET[(index >> s) & 31] for s in (15, 10, 5, 0)])

    return hrp, k, share_idx, byte_len, ident, index >> 20

I left out the custom rule for hrp="ms" and k=0 which would have defaulted the identifier to fingerprint for simplicity.

Here are some test vectors:

assert bip93_path_to_parameters("m/83696968'/93'/0'/16'/0'") == ('ms', 0, 's', 16, 'qqqq', 0)
assert bip93_path_to_parameters("m/83696968'/93'/199'/32'/999999999'") == ('cl', 9, 'k', 32, '4j0l', 953)
assert bip93_path_to_parameters("m/83696968'/93'/199'/64'/2147483647'") == ('cl', 9, 'k', 64, 'llll', 2047)

assert bip93_parameters_to_path("ms", 0, "s", 16, 'test', 0) == "m/83696968'/93'/0'/16'/386571'"
assert bip93_parameters_to_path("ms", 3, "a", 16, 'cash', 0) == "m/83696968'/93'/31'/16'/816663'"
assert bip93_parameters_to_path("ms", 3, "c", 16, 'cash', 0) == "m/83696968'/93'/32'/16'/816663'"

assert bip93_parameters_to_path("ms", 2, "a", 16, 'qqqq', 0) == "m/83696968'/93'/21'/16'/0'"
assert bip93_parameters_to_path("ms", 2, "c", 32, 'qqqp', 0) == "m/83696968'/93'/22'/32'/1'"
assert bip93_parameters_to_path("ms", 3, "a", 16, 'llll', 0) == "m/83696968'/93'/31'/16'/1048575'"
assert bip93_parameters_to_path("ms", 3, "c", 64, 'qqqq', 1) == "m/83696968'/93'/32'/64'/1048576'"
assert bip93_parameters_to_path("ms", 3, "d", 16, 'qqqp', 1) == "m/83696968'/93'/33'/16'/1048577'"

This header_path_int encoding is very human-readable as 100s place is the hrp value, 10s is k and 1s the alphabetical order share index. The last index digit must be <= the k digit to be valid (our limitation discussed last week on valid share indexes to derive).

If we cram identifier into header_int as suggested: we lose this decimal readability AND 20-bits of hrp codes. So I am convinced the identifier needs to set the child index derivation level or, less preferably, be set by the child index value.
Especially given they have a purpose in common: a public label for a unique secret.

Both BIP85 child indices and the codex32 identifier aim to disambiguate or label derived/encoded secrets so you can tell one master seed apart from another.

They're not the same thing but they're close enough we should handle them together for derivation purposes.

@scgbckbone
Copy link
Contributor

Since only hrp = "ms" secrets represent a BIP-0032 master seed with an associated master key pair and master pubkey fingerprint perhaps we drop the fingerprint default? Or (my preference) use it only in the ms10 case?

I'm planning to extende the BIP(s) with new HRP, that will encode chaincode+privkey (64bytes) for compatibility with BIP-39. I will use master fingerprint 20 MSB for it in my application as default. Users will have ability to change to custom ID. So up to you.

Here is my xfp to codex32 id converter:

c = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"

def xfp_to_codex32_id(xfp):
     x = (int(xfp, 16) >> 12) & 0xFFFFF  # Extract exactly 20 MSB
     return c[(x >> 15) & 31] + c[(x >> 10) & 31] + c[(x >> 5) & 31] + c[x & 31]

ok, I'm sold. You definitely put much more thought into this than I did. Thanks for lengthy & helpful explanations!

Concept ACK

@BenWestgate
Copy link
Author

BenWestgate commented Nov 26, 2025

I'm planning to extend the BIP(s) with new HRP, that will encode chaincode+privkey (64bytes) for compatibility with BIP-39. I will use master fingerprint 20 MSB for it in my application as default. Users will have ability to change to custom ID. So up to you.

Lets say that specific application prefixes may further restrict the valid byte length(s) and set a default identifier on output secrets.

Your design should probably use the 20-MSB of the master key fingerprint as the identifier on master xprvs and master xpubs. That has the awesome property of keeping the same default identifier as the master seed that derives them:

master seed (hex): deadbeefdeadbeef...
master key fingerprint (hex): 00000000
codex32-encoded master seed (bech): ms10qqqqs...
codex32-encoded master extended private key (bech): xprv10qqqqs...
codex32-encoded master extended public key (bech): xpub1qqqqs

To support this though we must go with the original "s" secret design that does NOT incorporate the ID into the derivation. As you need to know the derived private key to compute the fingerprint ID.

That way we can derive and label strings for your application properly.

Shares however still must incorporate the identifier in derivation.

We also need to relax the byte_length restriction in this PR to 1-626.

I'm writing a PR to BIP93 to generalize it for any HRP, which is for precisely for applications like yours, so please review it if you have time:

#2040

@scgbckbone
Copy link
Contributor

Your design should probably use the 20-MSB of the master key fingerprint as the identifier on master xprvs and master xpubs. That has the awesome property of keeping the same default identifier as the master seed that derives them:

To support this though we must go with the original "s" secret design that does NOT incorporate the ID into the derivation. As you need to know the derived private key to compute the fingerprint ID.

I doubt anyone will start using bech32 encoded extended keys, at least I do not plan to, even tho better readability than base58, that standard is set in stone at this point. If you consider it useful, I do not mind if you optimize this BIP-85 app for it.

@BenWestgate

This comment has been minimized.

@scgbckbone
Copy link
Contributor

scgbckbone commented Nov 27, 2025

When I see storing the chain code and private key, I am thinking you're encoding an extended key (you are) so that seems most useful to go all the way.

you're probably right that it make sense to attempt it all the way to proper extended key encoding, but no I wasn't treating it as such, I was just storing secret that consist from chaincode + privkey from which naked/root (without meta) extended private key can be re-assembled

Andrew already has an informal standard for storing BIP39 words in codex32 so we don't need to store the hdseed or extended key.

do you have link ?

@BenWestgate

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Pending acceptance This BIP modification requires sign-off by the champion of the BIP being modified Proposed BIP modification

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants