|
1 | 1 | # go-unixfs-proof |
2 | 2 |
|
3 | | -## TODO |
4 | | -- Create minimal README |
5 | | -- CI: tests. |
6 | | -- CI: lints. |
7 | | -- Benchmarks: minimal ones. |
| 3 | +Go implementation of offset-based native UnixFS proofs. |
| 4 | + |
| 5 | +**Note:** this is a side-project not used in production. It's mostly in alpha version. It isn't optimized at any level nor audited in any way. |
| 6 | + |
| 7 | +## Table of contents |
| 8 | +- [About the project](#about) |
| 9 | +- [Assumptions of the UnixFS DAG file](#Assumptions-of-the-UnixFS-DAG-file) |
| 10 | +- [Proof format](#proof-format) |
| 11 | +- [Use-case analysis and security](#use-case-analysis-and-security) |
| 12 | +- [Proof sizes and benchmark](#proof-sizes-and-benchmark) |
| 13 | +- [Roadmap](#roadmap) |
| 14 | +- [Contributing](#contributing) |
| 15 | +- [License](#license) |
| 16 | +- [Contact](#contact) |
| 17 | + |
| 18 | + |
| 19 | +## About |
| 20 | +This library allows generating and verification proofs for UnixFS file DAGs. |
| 21 | + |
| 22 | +The challenger knows the _Cid_ of a UnixFS DAG and the maximum size of the underlying represented file. This information asks the prover to generate proof that it stores the block at a specified offset between _[0, max-file-size]_. |
| 23 | + |
| 24 | +The proof is a sub-DAG of the original, which contains the path to the targeted block, plus each level of intermediate nodes. |
| 25 | + |
| 26 | +## Assumptions of the UnixFS DAG file |
| 27 | +This library works with any file UnixFS DAG. It doesn't assume any particular layout (e.g., balanced, trickle, etc.), chunking (e.g., fixed size, etc.), or other particular DAG builder configuration. |
| 28 | + |
| 29 | +## Proof format |
| 30 | +To avoid inventing any new proof standard or format, the proof is a byte array. This byte array is a CAR file format of all the blocks that are part of the proof. |
| 31 | + |
| 32 | +Today this is the decided format mostly to avoid friction about defining other formats. The order of blocks in the CAR file should be considered undefined despite the current implementation having a BFS order. |
| 33 | + |
| 34 | +## Use-case analysis and security |
| 35 | +The primary motivation is to support a random-sampling-based challenge system between a prover and a verifier. |
| 36 | + |
| 37 | +Given a file with size _MaxSize_, the verifying can ask the prover to generate proof with the underlying block for a specified _Cid_. |
| 38 | + |
| 39 | +The security of this schema is similar to other random-sampling schemas: |
| 40 | +- If the underlying prover doesn't have the block, it won't generate the proof. |
| 41 | +- If the offset is random-sampled in the _[0, MaxSize]_ range, it can't be guessed by the prover without storing all the files. |
| 42 | + |
| 43 | +If the bad-prover is storing only part of the leaves _p_ (e.g., 50%): |
| 44 | +- A single challenge makes the prover have a probability `p` (e.g., 50%) of success. |
| 45 | +- If the challenger asks for N (e.g., 5) proofs, the probability of generating all correct proofs is `p^N` (e.g., 3%) at the cost of a proof size of `SingleProofSize*N`. |
| 46 | + |
| 47 | +If the underlying file has some erasure coding applied with leverage `X` (e.g., 2x): |
| 48 | +- A single challenge makes the prover have a probability of `p^X` of success (e.g., 25%) |
| 49 | +- If the challenger asks for N (e.g., 5) proofs, the probability of generating all correct proofs is `p^(X*N)` (e.g., 0.097%) |
| 50 | + |
| 51 | +In summary, applying an erasure coding schema in the underlying file can make a single proof be _good enough_ to balance the proof size with more underlying storage for the original file. |
| 52 | + |
| 53 | +Notice that if the prover has missing internal nodes of the UnixFS, then the impact of a missed block is much higher than missing leaves (underlying data) since the probability of hitting an internal node is way bigger than leaves for a random offset. (e.g., if the root Cid block is missing, all challenges will fail). This means that the probability of the prover failing to provide the proofs is lower than the analysis made above for leaves. |
| 54 | + |
| 55 | + |
| 56 | +## Proof sizes and benchmark |
| 57 | +The size of the proof should be already close to the minimal level. Notice that these proofs are pretty big for the single reason that no assumptions are made of DAG layout nor chunking. Thus internal nodes at visited levels include many children. If the fan-out factor at each level is the default-ish ones, this involves a non-negligible number of blocks, which are unavoidable to allow having these minimal assumptions. |
| 58 | + |
| 59 | +Generating and verifying proofs are mostly symmetrical operations. The current implementation is very naive and not optimized in any way. Being stricter with the spec CAR serialization block order can make the implementation faster. Probably, not a big deal unless you're generating proofs for thousands of _Cids_. |
| 60 | + |
| 61 | +## Roadmap |
| 62 | +The following bullets will probably be implemented soon: |
| 63 | +- [ ] Allow direct leaf Cid proof (non-offset based); a bit offtopic for this lib and not sure entirely useful. |
| 64 | +- [ ] Benchmarks, may be fun but nothing entirely useful for now. |
| 65 | +- [ ] CLI command wirable to `go-ipfs`. The lib already supports any `DAGService` so anything can be pluggable. |
| 66 | +- [ ] Allow strict mode proof validation; maybe it makes sense to fail faster in some cases, nbd. |
| 67 | +- [ ] CLI for validation from DealID in Filecoin network; maybe fun, but `Labels` are unverified. |
| 68 | +- [ ] Many border-case tests. |
| 69 | + |
| 70 | +## Contributing |
| 71 | + |
| 72 | +Contributions make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**. |
| 73 | + |
| 74 | +If you like to donate, please send funds to [this project wallet-address](https://etherscan.io/address/0x2750E75E3771Dfb5041D5014a3dCC6e052fcd575). Any received funds will be directed to other open-source projects or organizations. |
| 75 | + |
| 76 | +## License |
| 77 | + |
| 78 | +Distributed under the MIT License. See `LICENSE` for more information. |
| 79 | + |
| 80 | +## Contact |
| 81 | +Ignacio Hagopian - [@jsign](https://github.com/jsign) |
0 commit comments