Skip to content

Commit 9ec057b

Browse files
committed
Merge: tcp: Add memory barrier to tcp_push()
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3921 JIRA: https://issues.redhat.com/browse/RHEL-22708 Upstream Status: linux.git commit 7267e8d Author: Salvatore Dipietro <dipiets@amazon.com> Date: Fri Jan 19 11:01:33 2024 -0800 tcp: Add memory barrier to tcp_push() On CPUs with weak memory models, reads and updates performed by tcp_push to the sk variables can get reordered leaving the socket throttled when it should not. The tasklet running tcp_wfree() may also not observe the memory updates in time and will skip flushing any packets throttled by tcp_push(), delaying the sending. This can pathologically cause 40ms extra latency due to bad interactions with delayed acks. Adding a memory barrier in tcp_push removes the bug, similarly to the previous commit bf06200 ("tcp: tsq: fix nonagle handling"). smp_mb__after_atomic() is used to not incur in unnecessary overhead on x86 since not affected. Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and Apache Tomcat 9.0.83 running the basic servlet below: import java.io.IOException; import java.io.OutputStreamWriter; import java.io.PrintWriter; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; public class HelloWorldServlet extends HttpServlet { @OverRide protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=utf-8"); OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8"); String s = "a".repeat(3096); osw.write(s,0,s.length()); osw.flush(); } } Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+ values is observed while, with the patch, the extra latency disappears. No patch and tcp_autocorking=1 ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello ... 50.000% 0.91ms 75.000% 1.13ms 90.000% 1.46ms 99.000% 1.74ms 99.900% 1.89ms 99.990% 41.95ms <<< 40+ ms extra latency 99.999% 48.32ms 100.000% 48.96ms With patch and tcp_autocorking=1 ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello ... 50.000% 0.90ms 75.000% 1.13ms 90.000% 1.45ms 99.000% 1.72ms 99.900% 1.83ms 99.990% 2.11ms <<< no 40+ ms extra latency 99.999% 2.53ms 100.000% 2.62ms Patch has been also tested on x86 (m7i.2xlarge instance) which it is not affected by this issue and the patch doesn't introduce any additional delay. Fixes: 7aa5470 ("tcp: tsq: move tsq_flags close to sk_wmem_alloc") Signed-off-by: Salvatore Dipietro <dipiets@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antoine Tenart <atenart@redhat.com> Approved-by: Hangbin Liu <haliu@redhat.com> Approved-by: Guillaume Nault <gnault@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: Scott Weaver <scweaver@redhat.com>
2 parents 7280912 + df99501 commit 9ec057b

File tree

1 file changed

+14
-17
lines changed

1 file changed

+14
-17
lines changed

net/ipv4/tcp_output.c

Lines changed: 14 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1141,6 +1141,8 @@ void tcp_wfree(struct sk_buff *skb)
11411141
struct sock *sk = skb->sk;
11421142
struct tcp_sock *tp = tcp_sk(sk);
11431143
unsigned long flags, nval, oval;
1144+
struct tsq_tasklet *tsq;
1145+
bool empty;
11441146

11451147
/* Keep one reference on sk_wmem_alloc.
11461148
* Will be released by sk_free() from here or tcp_tasklet_func()
@@ -1157,28 +1159,23 @@ void tcp_wfree(struct sk_buff *skb)
11571159
if (refcount_read(&sk->sk_wmem_alloc) >= SKB_TRUESIZE(1) && this_cpu_ksoftirqd() == current)
11581160
goto out;
11591161

1160-
for (oval = READ_ONCE(sk->sk_tsq_flags);; oval = nval) {
1161-
struct tsq_tasklet *tsq;
1162-
bool empty;
1163-
1162+
oval = smp_load_acquire(&sk->sk_tsq_flags);
1163+
do {
11641164
if (!(oval & TSQF_THROTTLED) || (oval & TSQF_QUEUED))
11651165
goto out;
11661166

11671167
nval = (oval & ~TSQF_THROTTLED) | TSQF_QUEUED;
1168-
nval = cmpxchg(&sk->sk_tsq_flags, oval, nval);
1169-
if (nval != oval)
1170-
continue;
1168+
} while (!try_cmpxchg(&sk->sk_tsq_flags, &oval, nval));
11711169

1172-
/* queue this socket to tasklet queue */
1173-
local_irq_save(flags);
1174-
tsq = this_cpu_ptr(&tsq_tasklet);
1175-
empty = list_empty(&tsq->head);
1176-
list_add(&tp->tsq_node, &tsq->head);
1177-
if (empty)
1178-
tasklet_schedule(&tsq->tasklet);
1179-
local_irq_restore(flags);
1180-
return;
1181-
}
1170+
/* queue this socket to tasklet queue */
1171+
local_irq_save(flags);
1172+
tsq = this_cpu_ptr(&tsq_tasklet);
1173+
empty = list_empty(&tsq->head);
1174+
list_add(&tp->tsq_node, &tsq->head);
1175+
if (empty)
1176+
tasklet_schedule(&tsq->tasklet);
1177+
local_irq_restore(flags);
1178+
return;
11821179
out:
11831180
sk_free(sk);
11841181
}

0 commit comments

Comments
 (0)