-
Notifications
You must be signed in to change notification settings - Fork 14k
Description
While working on #61486 / #87194 I came across backrefs in byte array values (with repeating elements), which I hadn't thought about (in terms of implications on optimizing the length of a symbol), but which is otherwise entirely expected. In order to demonstrate an edge case, I encoded some...longer...path::<{ &[0, 0, 0, 0] }> to get KRAh0_B12_B12_B12_E, which is parsed as follows:
K // const generic
R // &
A // [
h0_ // 0u8
B12_ // backref to "h0_" / `0u8`
B12_ // backref to "h0_" / `0u8`
B12_ // backref to "h0_" / `0u8`
E // ]Because this is a long enough symbol, the backrefs end up being (at least) 4 bytes each (to encode positions greater than 62), whereas an integer leaf constant with a 0..=15 value gets encoded as 3 bytes, so the backrefs are each adding an extra byte (and making more pointless work for the demangler).
We could avoid this by tracking the range, not just the start, of an encoded path/type/const, and one of:
- skip caching the backref position when encoding that position would require more bytes than the inline size
- record the range and make
print_backrefdecide between actually printing the backref, and duplicating the contents of that range, inline, depending on which would be shorter
However, this isn't an urgent concern IMO because of the combination of these aspects:
- the encoder gets to choose whether to use a backref or to repeat the encoding inline, we don't have to update decoders if we change our backref generation heuristic
- most backrefs are either 3 or 4 bytes (for positions up to
62or62² = 3844, respectively) - we already special-case one-character leaves, to never record themselves for backrefs, so only less-trivial path/type/const syntax, which will produce 2 bytes or more, is relevant
- paths have to include a crate root leaf (at least ~16 bytes) or a backref (at least 3 bytes), so the shortest possible paths are 5 bytes (e.g. inherent
implon a builtin type, using a backref for theimplparent module)- this further implies that only types/consts that don't reference paths can be 2-3 bytes (e.g.
&strisRe,[u8]isSh,fn()isFEu, etc.)
- this further implies that only types/consts that don't reference paths can be 2-3 bytes (e.g.
It might be interesting to measure the size difference across some larger projects (including rustc and Cargo), by making this change (even inefficiently), after v0 becomes the default, but I don't expect a difference bigger than a handful of bytes per symbol, if that.