Skip to content

Commit 87264b7

Browse files
bk2204gitster
authored andcommitted
docs: update pack index v3 format
Our current pack index v3 format uses 4-byte integers to find the trailer of the file. This effectively means that the file cannot be much larger than 2^32. While this might at first seem to be okay, we expect that each object will have at least 64 bytes worth of data, which means that no more than about 67 million objects can be stored. Again, this might seem fine, but unfortunately, we know of many users who attempt to create repos with extremely large numbers of commits to get a "high score," and we've already seen repositories with at least 55 million commits. In the interests of gracefully handling repositories even for these well-intentioned but ultimately misguided users, let's change these lengths to 8 bytes. For the checksums at the end of the file, we're producing 32-byte SHA-256 checksums because that's what we already do with pack index v2 and SHA-256. Truncating SHA-256 doesn't pose any actual security problems other than those related to the reduced size, but our pack checksum must already be 32 bytes (since SHA-256 packs have 32-byte checksums) and it simplifies the code to use the existing hashfile logic for these cases for the index checksum as well. In addition, even though we may not need cryptographic security for the index checksum, we'd like to avoid arguments from auditors and such for organizations that may have compliance or security requirements. Using the simple, boring choice of the full SHA-256 hash avoids all possible discussion related to hash truncation and removes impediments for these organizations. Note that we do not yet have a pack index v3 implementation in Git, so it should be fine to change this format. However, such an implementation has been written for future inclusion following this format. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
1 parent c44beea commit 87264b7

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

Documentation/technical/hash-function-transition.adoc

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -227,9 +227,9 @@ network byte order):
227227
** 4-byte length in bytes of shortened object names. This is the
228228
shortest possible length needed to make names in the shortened
229229
object name table unambiguous.
230-
** 4-byte integer, recording where tables relating to this format
230+
** 8-byte integer, recording where tables relating to this format
231231
are stored in this index file, as an offset from the beginning.
232-
* 4-byte offset to the trailer from the beginning of this file.
232+
* 8-byte offset to the trailer from the beginning of this file.
233233
* Zero or more additional key/value pairs (4-byte key, 4-byte
234234
value). Only one key is supported: 'PSRC'. See the "Loose objects
235235
and unreachable objects" section for supported values and how this
@@ -276,10 +276,14 @@ network byte order):
276276
up to and not including the table of CRC32 values.
277277
- Zero or more NUL bytes.
278278
- The trailer consists of the following:
279-
* A copy of the 20-byte SHA-256 checksum at the end of the
279+
* A copy of the full main hash checksum at the end of the
280280
corresponding packfile.
281281

282-
* 20-byte SHA-256 checksum of all of the above.
282+
* Full main hash checksum of all of the above.
283+
284+
The "full main hash" is a full-length hash of the main (not compatibility)
285+
algorithm in the repository. Thus, if the main algorithm is SHA-256, this is
286+
a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash.
283287

284288
Loose object index
285289
~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)