Skip to content

Commit 6e3a266

Browse files
edumazetgregkh
authored andcommitted
tcp: fix tcp_tso_should_defer() vs large RTT
[ Upstream commit 295ce1e ] Neal reported that using neper tcp_stream with TCP_TX_DELAY set to 50ms would often lead to flows stuck in a small cwnd mode, regardless of the congestion control. While tcp_stream sets TCP_TX_DELAY too late after the connect(), it highlighted two kernel bugs. The following heuristic in tcp_tso_should_defer() seems wrong for large RTT: delta = tp->tcp_clock_cache - head->tstamp; /* If next ACK is likely to come too late (half srtt), do not defer */ if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0) goto send_now; If next ACK is expected to come in more than 1 ms, we should not defer because we prefer a smooth ACK clocking. While blamed commit was a step in the good direction, it was not generic enough. Another patch fixing TCP_TX_DELAY for established flows will be proposed when net-next reopens. Fixes: 50c8339 ("tcp: tso: restore IW10 after TSO autosizing") Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20251011115742.1245771-1-edumazet@google.com [pabeni@redhat.com: fixed whitespace issue] Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent 0134c7b commit 6e3a266

File tree

1 file changed

+15
-4
lines changed

1 file changed

+15
-4
lines changed

net/ipv4/tcp_output.c

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2219,7 +2219,8 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
22192219
u32 max_segs)
22202220
{
22212221
const struct inet_connection_sock *icsk = inet_csk(sk);
2222-
u32 send_win, cong_win, limit, in_flight;
2222+
u32 send_win, cong_win, limit, in_flight, threshold;
2223+
u64 srtt_in_ns, expected_ack, how_far_is_the_ack;
22232224
struct tcp_sock *tp = tcp_sk(sk);
22242225
struct sk_buff *head;
22252226
int win_divisor;
@@ -2281,9 +2282,19 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
22812282
head = tcp_rtx_queue_head(sk);
22822283
if (!head)
22832284
goto send_now;
2284-
delta = tp->tcp_clock_cache - head->tstamp;
2285-
/* If next ACK is likely to come too late (half srtt), do not defer */
2286-
if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0)
2285+
2286+
srtt_in_ns = (u64)(NSEC_PER_USEC >> 3) * tp->srtt_us;
2287+
/* When is the ACK expected ? */
2288+
expected_ack = head->tstamp + srtt_in_ns;
2289+
/* How far from now is the ACK expected ? */
2290+
how_far_is_the_ack = expected_ack - tp->tcp_clock_cache;
2291+
2292+
/* If next ACK is likely to come too late,
2293+
* ie in more than min(1ms, half srtt), do not defer.
2294+
*/
2295+
threshold = min(srtt_in_ns >> 1, NSEC_PER_MSEC);
2296+
2297+
if ((s64)(how_far_is_the_ack - threshold) > 0)
22872298
goto send_now;
22882299

22892300
/* Ok, it looks like it is advisable to defer.

0 commit comments

Comments
 (0)