-
Notifications
You must be signed in to change notification settings - Fork 5.8k
BIP85: Add Codex32 as application 93' #1958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Documenting recent discussions: |
It seems you'd like to consolidate some of the paths. There's a few ways to do this, if you have a favorite or one that immediately stands out as obvious let me know. I'm thinking the identifier could be the bech32 encoding of the bip85 index, as the purpose of incrementing the index is to get new seeds, and BIP93 says "...the identifier SHOULD be distinct for all master seeds the user may need to disambiguate." index = 0 -> identifier = qqqq, index = 1 -> identifier qqqp, and so on. A particular identifier could be selected by converting it to an integer {index} once index reaches 32^4, it can fall back to the default BIP-0032 fingerprint or roll over. On byte extraction: I agree we should draw If we output share indices still can use the current read one byte at a time method. |
jonatack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @scgbckbone (who has been active on BIP85 review) for feedback.
|
Seems to me this is well over the BIP-85 application scope. As I understand it, BIP85 generates "a thing" from "a thing". Your application is generating "multiple things" from "a thing". Why are you generating multiple initial shares via BIP85 ? What I imagined BIP85 application should looks like after reading BIP93:
** maybe even threshold should be part of the BIP32 derivation path, BUT I think not as it has no effect to the actual secret generated (it only affects checksum) Assuming I'm not wrong in my "specualtion", why not just use m/83696968'/128169'/{num_bytes}'/{index}' to generate deterministic bytes from BIP-32 root seed for any share ? |
|
Thank you for great feedback @scgbckbone. I'll explain the rationale behind your questions first.
True, but that thing can be structured.
Determinism. bip85 = Bip85(master_root_xprv)
# 1. generate secret share "s" from root seed
secret1 = bip85.derive_codex32(t=3, share_idx='s')
# 2. generate `k` any non-"s" shares from root seed, interpolate according to BIP93
secret2 = Codex32String.interpolate_at(
[
bip85.derive_codex32(k=3, share_idx='a'),
bip85.derive_codex32(k=3, share_idx='c),
bip85.derive_codex32(k=3, share_idx='d'),
],
target="s"
)
secret3 = Codex32String.recover(
[
bip85.derive_codex32(k=3, share_idx='x'),
bip85.derive_codex32(k=3, share_idx='y),
bip85.derive_codex32(k=3, share_idx='z'),
],
target="s"
)
derived_secrets = [secret1, secret2, secret3]
identifiers = set()
master_seeds = set()
for secret in derived_secrets:
identifiers.add(secret.identifier)
master_seeds.add(secret.data)
if len(identifiers) < len(master_seeds):
raise Bip93Quote("Identifier SHOULD be distinct for every master seed the user may need to disambiguate")For the same BIP85 root key, each
For secret sharing, the For unshared secrets, the threshold has no effect, so it'd be ideal to ignore it when not secret sharing. That way, knowing the BIP85 index and root key uniquely identifies the seed, regardless of threshold, consistent with other BIP85 applications.
Then why not use that for BIP39 or any other application too? Based on feedback from you and @akarve
Simplifications:
Example of the simplified form: bip85 = Bip85(master_root_xprv)
# 1. generate k=0 secret share "s" from root seed
secret1 = bip85.derive_codex32(k=0)
# 2. generate `k` fixed non-"s" shares from root seed, interpolate according to BIP93
shares = bip85.derive_codex32(k=2)
secret2 = Codex32String.interpolate_at(shares, target="s")
shares = bip85.derive_codex32(k=3)
secret3 = Codex32String.interpolate_at(shares, target="s")
derived_secrets = [secret1, secret2, secret3]
identifiers = set()
master_seeds = set()
for secret in derived_secrets:
identifiers.add(secret.identifier)
master_seeds.add(secret.data)
assert len(identifiers) == len(master_seeds)
# header is distinct for each master seed the user may need to disambiguate
assert len(master_seeds) == 1
# same master seed for same {index}, regardless of k
The version brings it back within scope:
|
agreed, rest my case here...
this is bad comparison, as 12/24 words represent encoding of 16/32 bytes of entropy. While your approach creates multiple shares. Same as if I would create multiple 12 words seeds from 16 bytes of entropy. this is imo ok (using code from your snippets whithout ever running it or reviewing it). My understanding is that each line in below snippet, generates just one share? with what I have issue is this, where you just generating multiple shares (somehow):
only Don't get me wrong, I'm not intending to block this BIP update. Updated version is much better. I'm only trying to figure out why is this needed & whether there is any advantage in what you're doing vs. what I'm doing. Here is my pseudo-code, to try to prove the point that nothing else than simple "one share generation" is needed here & rest can be left to BIP-93 interpolation: Above pseudo-code always generate the same shares. |
I would not say these are quite the same. Both the initial k shares and 12 word seeds need entropy to create but a set of k shares does represent a single seed.
This can't generate shares because {share_idx} must be "S" for If the purpose of BIP85 is deterministic randomness from bip32 keychains, why suddenly stop short of providing the additional randomness needed to do SSS for an SSS-aware format? Path: Now each {bip85 index} derives a single seed as with bip39 but we also give deterministic entropy for secret sharing. Leaving wallets to interpolate any remaining shares.
Yes, it was an example of part 2 of your proposal:
The disadvantages I saw with it for the same {bip85 index}:
To support deterministic generation of initial strings for users/wallets intending to do SSS, we should output a threshold quantity of strings to avoid these 3 interoperability and recovery problems. However this isn't necessary for a minimum viable PR, as How should I proceed? |
|
@akarve thoughts here? (thanks!) |
|
Yeah. Not to slow down the innovation here (and in @3rdIteration 's PRs) but my thinking is for me take on implementing both applications in the current reference implementation as Python protocols. My belief is that if we can come up with a standard duck typing interface for all BIP-85 applications that this—as yet non-existent—abstraction will stand the test of time. Of course anyone else in this thread is free to propose the shape of the protocol and even implement it without me. My experience is that we will need full unit tests and such for the protocol to be hard enough to stand the test of time. I'm volunteering to complete said protocol + implementation this quarter but if anyone wants to go first down this path they're welcome. Protocol designs can go in this thread. The ultimate goal is we have a standard protocol/interface for BIP-85 applications and a standard "graph" that they all go through and boom out come the entropy products. In this way all applications use the same protocol and same core logic. This would benefit current and future applications as well as benefit PR-clearing speed (after the initial investment) once the protocol is in place. Said protocol will also resolve the "one-product/two-product" style debates happening in this PR because the quacking of the duck types will resolve such points as how many things turn into how many other things. I hope that makes sense. |
Each app should define a function to take parameters and derived entropy, truncate as needed (or seed a DNRG) and format it.
def entropy_to_output(entropy, parameters):
# Logic to produce output based on provided entropy and parameters
return formatted_outputDerivation paths have a semantic application number, end with an Many applications also need to define a function to generate the BIP85 derivation path based on their parameters. def parameters_to_path(parameters, index):
# Logic to produce the BIP85 derivation path
return bip85_derivation_pathI updated this PR to generate single codex32 strings per BIP85 |
Added detailed explanation of BIP93 codex32 derivation path, including examples for generating codex32 secrets and shares.
Update BIP proposal text with new design
After Heavy thinking, I concur the best solution is to generate single codex32 strings in a manner very similar to what @scgbckbone proposed. I have rewritten the BIP text to match this design, it is a fraction of the original's length so that's another win. I look forward to updated feedback and will be re-writing the reference implementation in the coming weeks. The use-case that sold me on single share outputs was:
By limiting the shares to the first threshold indicies and including the identifier in share entropy derivation, it avoids the issue of there being multiple seeds recoverable at a given threshold and identifier, at least from a single bip85 root key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better!
few more questions/comments:
what are CANONICAL_INDICES ? it seems that you limit maximum number of shares to 10 by using them ?
The identifier SHOULD default to the 4 left-most characters of the Bech32-encoded BIP-0032 fingerprint derived from the master seed.
why? imo this should be left out from this spec and left to specific implementations to decide
Users should enter a unique identifier instead of incrementing {index} as different share sets SHOULD have unique identifiers.
I agree here & thinking that maybe we should remove index completely...
What do you think about below derivation path calculation:
CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
VALID_HRP = {
"ms": 0,
"cl": 1,
}
def pack_header(threshold: int, identifier: str, share_idx: str) -> int:
id_a = CHARSET.index(identifier[0])
id_b = CHARSET.index(identifier[1])
id_c = CHARSET.index(identifier[2])
id_d = CHARSET.index(identifier[3])
idx = CHARSET.index(share_idx)
return (threshold << 25) | (id_a << 20) | (id_b << 15) | (id_c << 10) | (id_d << 5) | idx
def unpack_header(n: int):
threshold = (n >> 25) & 31
id_a = (n >> 20) & 31
id_b = (n >> 15) & 31
id_c = (n >> 10) & 31
id_d = (n >> 5) & 31
idx = n & 31
identifier = ""
for i in [id_a, id_b, id_c, id_d]:
identifier += CHARSET[i]
return threshold, identifier, CHARSET[idx]
def make_path(hrp, byte_len, threshold, identifier, share_idx, ):
header = pack_header(threshold, identifier, share_idx)
pth = f"m/83696968h/93h/{VALID_HRP[hrp]}h/{byte_len}h/{header}h"
return pth
b85_path = make_path("ms", 16, 9, "llll", "l")
print(b85_path)
# m/83696968h/93h/0h/16h/335544319h
print(unpack_header(int(b85_path.split("/")[-1][:-1])))
# (9, 'llll', 'l')- removed idx (even tho I have second thoughts, as it is now very different than other BIP-85 apps, but maybe no care as it is already very different as with most other app you only choose index and nothing else. Here, on the other hand, there is plenty to specify)
- added HRP into the path directly - useful imo - users can see immediately what is it for
|
Quoting BIP93: For a fresh master seed to justify my answers:
I renamed that constant to: It is the secret index "s" followed by the 31 share indices (excluding 's') alphabetically sorted. Prepending "s" makes
BIP93 prescribes using the first k alphabetical share indices for "random initial shares" (our bip85 entropy application). BIP85 deriving share indices beyond this initial k makes recovery ambiguous where knowing the common share header of a backup ( Example: Breaking BIP85's {index} customary rule:
And BIP93's identifier rule:
If The additional 31-k shares beyond the first k may be derived using BIP93 interpolation, which is outside spec as it needs no fresh randomness. Outputting interpolated shares would be like our XPRV application producing an extended private key encoding of HMAC-SHA512("Bitcoin seed", S) where S is the payload of our WIF application. We don't do that, we always generate fresh randomness or nothing at all. And I've proven the problem with generating fresh randomness for shares beyond the first k alphabetical indices.
My first thought was "OH, that's good!", I like this derivation path, provided we add the However, Are you sure you want to eliminate the bip85 {index}? All deployed BIP85 applications use this derivation path feature and the reference implementation expects one:
It's recommended headers be distinct for unique seeds, but not required as with no {child index} derivation.
Identifier needs an assignment to encode the string so it may as well be the most useful deterministic data. If the user can specify and we keep a child {index}, we break the "SHOULD be distinct for every seed" rule. I resorted to the fingerprint default based on an assumption bip85 applications MUST have a child index for "millions of unique" "fresh secrets" as per quotes above. I dislike the least significant bits of {index}' being Retaining child index and using a default fingerprint identifier for unshared secrets makes them most like other bip85 apps. It's not required though, we could derive unshared secrets like shares by serializing the identifier into the child index, subject to the restriction k must be 0 if This version always requires the specifying the identifier even for secrets: IDX_ORDER = "sacdefghjk"
VALID_HRP = {
"ms": 0,
"cl": 1
}
def bip93_parameters_to_path(hrp, threshold=0, identifier="", share_idx, byte_len, index=0):
if threshold and share_idx not in IDX_ORDER[1:threshold+1]:
raise InvalidShareIndex()
if not identifier:
raise MissingIdentifier()
if index > 31:
raise ValueError("maximum index 31 allowed for codex32 strings")
index = (
+ threshold * 2 // 2 + IDX_ORDER.index(share_idx) << 25
+ index << 20
+ int.from_bytes(bech32_decode(identifier), 'big')
)
return f"m/83696968'/93'/{VALID_HRP[hrp]}'/{byte_len}'/{index}'"This allows 32 child index values which combined with a million+ identifiers meets the existing "millions" spec. Incrementing this level changes the identifier as expected by BIP93 for a unique secret. I still prefer combining def bip93_parameters_to_path(hrp, k=0, identifier="", share_idx, byte_len, index=0):
if share_idx not in IDX_ORDER[min(1, k) : k + 1]:
raise InvalidShareIndex()
header_path_int = VALID_HRP[hrp] * 100 + k * 10 + IDX_ORDER.index(share_idx)
if share_idx != "s":
if not identifier:
raise MissingIdentifier()
if index > 2**11 - 1:
raise ValueError("maximum index 2047 allowed for codex32 shares")
index = index << 20 + from_bytes(bech32_decode(identifier), 'big')
return f"m/83696968'/93'/{header_path_int}'/{byte_len}'/{index}'"What do you think? |
completely missed that one, thanks for the reminder + examples
no, I'm not. Completely unsure about it tbh. I've got second thought while writing it & even more after I published. Now I think we should just have a proper index ((2**31) -1) at the end.
I like the idea of using master pubkey fingerprint, but is it even possible for bech32 charset where we do not have number
I your encoding two-way? can you get back to header used from int ? |
We have index 0 through 2**32-1 for secrets (k= 0, In codex32 and bip85, For identifier collision resistance, Since only
We can bech32-encode the 20-MSB of master pubkey fingerprint into the 4-character identifier instead of 16-bits if we mapped hex characters, which we can't since
Agreed, this simplifies implementations assumptions about path lengths.
There are 45
Here is an example path encoding that uses the 20-LSB of the identifier parameter as the 20-LSB of the child index derivation level, and left-shifts any optional "index" parameter 20-bits to not mangle the identifier part.
Yes. CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
IDX_ORDER = sorted(CHARSET, key=lambda c: (c != 's', c.isdigit(), c)) # "s" then alphabetical order
VALID_HRP = ["ms", "cl"]
def bip93_parameters_to_path(hrp, k, share_idx, byte_len, ident="", index=0):
if k == 1 or not (0 <= k <= 9):
raise ValueError("Invalid threshold parameter")
if share_idx not in IDX_ORDER[min(1, k) : k + 1]:
raise ValueError("Invalid share index")
if len(ident) != 4:
raise ValueError("Missing unique 4-character bech32 identifier")
if index > 2**11 - 1:
raise ValueError("maximum index 2047 allowed for codex32 shares")
header_path_int = VALID_HRP.index(hrp) * 100 + k * 10 + IDX_ORDER.index(share_idx)
for char in ident:
index = index << 5 | CHARSET.find(char)
return f"m/83696968'/93'/{header_path_int}'/{byte_len}'/{index}'"
def bip93_path_to_parameters(path=''):
header_int, byte_len, index = [int(segment[:-1]) for segment in path.split("/")[3:]]
hrp = VALID_HRP[header_int // 100]
k = header_int // 10 % 10
share_idx = IDX_ORDER[header_int % 10]
ident = "".join([CHARSET[(index >> s) & 31] for s in (15, 10, 5, 0)])
return hrp, k, share_idx, byte_len, ident, index >> 20
I left out the custom rule for Here are some test vectors: assert bip93_path_to_parameters("m/83696968'/93'/0'/16'/0'") == ('ms', 0, 's', 16, 'qqqq', 0)
assert bip93_path_to_parameters("m/83696968'/93'/199'/32'/999999999'") == ('cl', 9, 'k', 32, '4j0l', 953)
assert bip93_path_to_parameters("m/83696968'/93'/199'/64'/2147483647'") == ('cl', 9, 'k', 64, 'llll', 2047)
assert bip93_parameters_to_path("ms", 0, "s", 16, 'test', 0) == "m/83696968'/93'/0'/16'/386571'"
assert bip93_parameters_to_path("ms", 3, "a", 16, 'cash', 0) == "m/83696968'/93'/31'/16'/816663'"
assert bip93_parameters_to_path("ms", 3, "c", 16, 'cash', 0) == "m/83696968'/93'/32'/16'/816663'"
assert bip93_parameters_to_path("ms", 2, "a", 16, 'qqqq', 0) == "m/83696968'/93'/21'/16'/0'"
assert bip93_parameters_to_path("ms", 2, "c", 32, 'qqqp', 0) == "m/83696968'/93'/22'/32'/1'"
assert bip93_parameters_to_path("ms", 3, "a", 16, 'llll', 0) == "m/83696968'/93'/31'/16'/1048575'"
assert bip93_parameters_to_path("ms", 3, "c", 64, 'qqqq', 1) == "m/83696968'/93'/32'/64'/1048576'"
assert bip93_parameters_to_path("ms", 3, "d", 16, 'qqqp', 1) == "m/83696968'/93'/33'/16'/1048577'"This If we cram identifier into
They're not the same thing but they're close enough we should handle them together for derivation purposes. |
I'm planning to extende the BIP(s) with new HRP, that will encode chaincode+privkey (64bytes) for compatibility with BIP-39. I will use master fingerprint 20 MSB for it in my application as default. Users will have ability to change to custom ID. So up to you. Here is my xfp to codex32 id converter: c = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
def xfp_to_codex32_id(xfp):
x = (int(xfp, 16) >> 12) & 0xFFFFF # Extract exactly 20 MSB
return c[(x >> 15) & 31] + c[(x >> 10) & 31] + c[(x >> 5) & 31] + c[x & 31]ok, I'm sold. You definitely put much more thought into this than I did. Thanks for lengthy & helpful explanations! Concept ACK |
Lets say that specific application prefixes may further restrict the valid byte length(s) and set a default identifier on output secrets. Your design should probably use the 20-MSB of the master key fingerprint as the identifier on master xprvs and master xpubs. That has the awesome property of keeping the same default identifier as the master seed that derives them: master seed (hex): To support this though we must go with the original "s" secret design that does NOT incorporate the ID into the derivation. As you need to know the derived private key to compute the fingerprint ID. That way we can derive and label strings for your application properly. Shares however still must incorporate the identifier in derivation. We also need to relax the byte_length restriction in this PR to 1-626. I'm writing a PR to BIP93 to generalize it for any HRP, which is for precisely for applications like yours, so please review it if you have time: |
I doubt anyone will start using bech32 encoded extended keys, at least I do not plan to, even tho better readability than base58, that standard is set in stone at this point. If you consider it useful, I do not mind if you optimize this BIP-85 app for it. |
This comment has been minimized.
This comment has been minimized.
you're probably right that it make sense to attempt it all the way to proper extended key encoding, but no I wasn't treating it as such, I was just storing secret that consist from chaincode + privkey from which naked/root (without meta) extended private key can be re-assembled
do you have link ? |
This allows wallets to derive codex32 secrets and codex32 shares from BIP-0032 master keys.
Summary of changes
Rationale
Specification
Tests
Reference tests and new vectors will be included in the reference bipsea implementation:
BenWestgate/bipsea@master...BenWestgate:bipsea:master
Mailing List
Discussion: https://groups.google.com/g/bitcoindev/c/--lHTAtq0Qc
Status
Ready for conceptual and approach review. This change is additive and does not modify existing BIP-85 behavior.