Skip to content

Commit 0166d74

Browse files
e3krisztianqkaiser
authored andcommitted
feat(handlers): check header checksum in tar handler
The unix v7 old-style tar handler's pattern is not strict enough to prevent false positives, so checking the checksum might prevent these false matches. The header chksum is an octal representation of the sum of header bytes as (unsigned) integers (the chksum field is calculated with 8 spaces), followed by a null and a space (there are tar files with these bytes reversed). Multiple header checksums are calculated, as the old header is much shorter, than the newer headers. Wikipedia also mentions some historic implementations using signed sums. The potential match is discarded if the header checksum is not one of the calculated checksums.
1 parent 96a4aff commit 0166d74

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

unblob/handlers/archive/tar.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,37 @@ def calculate_chunk(self, file: File, start_offset: int) -> Optional[ValidChunk]
136136
header_size = snull(header.size)
137137
decode_int(header_size, 8)
138138

139+
def signed_sum(octets) -> int:
140+
return sum(b if b < 128 else 256 - b for b in octets)
141+
142+
if header.chksum[6:8] not in (b"\x00 ", b" \x00"):
143+
logger.error(
144+
"Invalid checksum format",
145+
actual_last_2_bytes=header.chksum[6:8],
146+
handler=self.NAME,
147+
)
148+
return None
149+
checksum = decode_int(header.chksum[:6], 8)
150+
header_bytes_for_checksum = (
151+
file[start_offset : start_offset + 148]
152+
+ b" " * 8 # chksum field is replaced with "blanks"
153+
+ file[start_offset + 156 : start_offset + 257]
154+
)
155+
extended_header_bytes = file[start_offset + 257 : start_offset + 500]
156+
calculated_checksum_unsigned = sum(header_bytes_for_checksum)
157+
calculated_checksum_signed = signed_sum(header_bytes_for_checksum)
158+
checksums = (
159+
calculated_checksum_unsigned,
160+
calculated_checksum_unsigned + sum(extended_header_bytes),
161+
# signed is of historical interest, calculating for the extended header is not needed
162+
calculated_checksum_signed,
163+
)
164+
if checksum not in checksums:
165+
logger.error(
166+
"Tar header checksum mismatch", expected=str(checksum), actual=checksums
167+
)
168+
return None
169+
139170
end_offset = _get_tar_end_offset(file, start_offset)
140171
if end_offset == -1:
141172
return None

0 commit comments

Comments
 (0)