You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
netstacklat: Add sanity checks for TCP HoL blocking filter
The logic for excluding samples from TCP reads that may have been
delayed by HOL blocking relies on reading a number of fields from the
TCP socket outside of the socket lock. This may be prone to errors due
to the socket state being updated at another place in the kernel while
our eBPF program is running. To reduce the risk that data races causes
issues for our HoL detection logic, add a number of sanity checks to
the read values.
The most problematic of the read fields is ooo_last_skb, as that is a
pointer to another skb. This pointer is only valid as long as the
out_of_order_queue is non-empty. Due to a data race, we may check that
the ooo-queue is non-empty while there are still SKBs in it, then the
ooo-queue is cleared by the kernel, and then we attempt to read the
contents of the ooo_last_skb SKB, which may at this point have been
freed and/or recycled. This may result in incorrect values being used
for the sequence limit used to exclude future reads of
ooo-segments. The faulty sequence limit may both cause reads of
HOL-blocked segments to be included or the exclusion of an
unnecessarily large amount of future reads (up to 2 GB).
To reduce the risk that the garbage data from an invalid SKB is used,
introduce two sanity checks for end_seq in the ooo_last_skb. First
check if the sequence number is zero, if so assume it is invalid (even
though it can be a valid sequence number). Even though we will get an
error code if reading the data from this SKB fails altogether, we may
still succeed reading from a no longer valid SKB, in which case there
is a high risk the data will have been zeroed. If it's non-zero, also
check that it is within the current receive window (if not, clamp it
to the receive window).
Also introduce sanity checks for rcv_nxt and copied_seq in the
tcp_sock, ensuring that they monotonically increase. To enable this,
the last read (sane) value is tracked together with the ooo-state in
the socket storage. For these, do not consider sequence 0 invalid, as
these fields should be valid (although possibly updated in-between) as
long as reading them succeeds (and failure to read is detected through
the returned error code of bpf_core_read()).
Skip adding similar monotonic growth validity checks for the rcv_wup
field that now may need to be probed to compute the receive window as
a compromise to not have to unconditionally probe and update its state
every time. For the rcv_wnd field, also needed to calculate the
receive window, I am not aware of any simple validity checks to
perform.
Signed-off-by: Simon Sundberg <simon.sundberg@kau.se>
0 commit comments