Skip to content

Commit 6e6ccfc

Browse files
authored
Merge pull request #1224 from yp969803/tcp_long_connection
Proposal for tcp_long_connection_metrics
2 parents a450591 + c03958f commit 6e6ccfc

File tree

1 file changed

+354
-0
lines changed

1 file changed

+354
-0
lines changed
Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
---
2+
title: Proposal for generating metrics for TCP long connections
3+
authors:
4+
- "yp969803"
5+
reviewers:
6+
- "nglwcy"
7+
- "lizhencheng"
8+
approvers:
9+
- "nlgwcy"
10+
- "lizhencheng"
11+
12+
creation-date: 2025-02-06
13+
---
14+
15+
16+
## Proposal for generating metrics for TCP long connections
17+
18+
<!--
19+
This is the title of your KEP. Keep it short, simple, and descriptive. A good
20+
title can help communicate what the KEP is and should be considered as part of
21+
any review.
22+
-->
23+
24+
Upstream issue: https://github.com/kmesh-net/kmesh/issues/1211
25+
26+
### Summary
27+
28+
<!--
29+
This section is incredibly important for producing high-quality, user-focused
30+
documentation such as release notes or a development roadmap.
31+
32+
A good summary is probably at least a paragraph in length.
33+
-->
34+
35+
Currently kmesh provides access logs during termination and establisment of a TCP connection with more detailed information about the connection, such as bytes sent, received, packet lost, rtt and retransmits.
36+
37+
Kmesh also provides workload and service specific metrics such as bytes sent and received, lost packets, minimum rtt, total connection opened and closed by a pod, These metrics are only updated after a connection is closed. In this proposal we are aiming to update these metrics periodically.
38+
39+
We are also aiming to implement access logs and metrics for TCP long connections, developing a continuous monitoring and reporting mechanisms that captures detailed, real-time data throughout the lifetime of long-lived TCP connections. Access logs are reported periodically with information such as reporting time, connection establishment time, bytes sent, received, packet losts, rtt, retransmits and state. Metrics such as bytes sent and received, packet losts, retransmits is also reported periodically for long connections.
40+
41+
### Motivation
42+
43+
<!--
44+
This section is for explicitly listing the motivation, goals, and non-goals of
45+
this KEP. Describe why the change is important and the benefits to users.
46+
-->
47+
48+
Performance and heath of the long connections can be known early, currently we get all the information of the connection by the metrics and access logs provided at the end after the connection termination.
49+
50+
#### Goals
51+
52+
<!--
53+
List the specific goals of the KEP. What is it trying to achieve? How will we
54+
know that this has succeeded?
55+
-->
56+
- Reporting workload and service based metrics periodically(5 sec).
57+
58+
- Collect detailed traffic metrics (e.g. bytes send/received, round-trip time, packet loss, tcp retransmission) continuously during the lifetime of long TCP connections using ebpf.
59+
60+
- Reporting of metrics and access logs, at periodic time of 5 seconds. We are choosing 5 seconds as a threshold time because, it allows enough time to accumulate meaningful changes in metrics. If the reporting interval is too short, it might cause excessive overhead by processing too many updates.
61+
62+
- Generation Access logs containing information about connection continuously during the lifetime of long TCP connections from the metrics data.
63+
64+
- Metrics and logs supporting open-telemetry format.
65+
66+
- Exposing these metrics by kmesh daemon so that prometheus can scrape it.
67+
68+
- Unit and E2E tests.
69+
70+
#### Non-Goals
71+
72+
<!--
73+
What is out of scope for this KEP? Listing non-goals helps to focus discussion
74+
and make progress.
75+
-->
76+
77+
- Collecting information about packet contents.
78+
79+
- Controlling or modifying TCP connection
80+
81+
- Collecting L7 metrics
82+
83+
### Proposal
84+
85+
<!--
86+
This is where we get down to the specifics of what the proposal actually is.
87+
This should have enough detail that reviewers can understand exactly what
88+
you're proposing, but should not include things like API designs or
89+
implementation. What is the desired outcome and how do we measure success?.
90+
The "Design Details" section below is for the real
91+
nitty-gritty.
92+
-->
93+
94+
TCP connection information will be collected using eBPF sockops, sk_msg and kprobes hooks, and stored in ebpf hash maps using socket cookie as a unique key for hashmap. RingBuffer map is used for sending connection info periodically to userspace.
95+
96+
97+
### Design Details
98+
99+
<!--
100+
This section should contain enough information that the specifics of your
101+
change are understandable. This may include API specs (though not always
102+
required) or even code snippets. If there's any ambiguity about HOW your
103+
proposal will be implemented, this is the place to discuss them.
104+
-->
105+
106+
#### Collecting Metrics
107+
108+
Decelearing ebpf hash map in probe.h to store information about tcp_connections.
109+
110+
```
111+
// Ebpf map to store active tcp connections
112+
struct {
113+
__uint(type, BPF_MAP_TYPE_HASH);
114+
__type(key, __u64); // use sock_cookie as key
115+
__type(value, struct tcp_probe_info);
116+
__uint(max_entries, MAP_SIZE_OF_TCP_CONNS);
117+
__uint(map_flags, BPF_F_NO_PREALLOC);
118+
} map_of_tcp_conns SEC(".maps");
119+
120+
```
121+
Sockpos ebpf hook is triggered at various socket events, we will use this hook to store and refresh connection information at the time of connection established, connection state change, retransmits (also trigger in packet losss).
122+
Updating workload/sockops.c
123+
124+
```
125+
SEC("sockops_active")
126+
int sockops_active_prog(struct bpf_sock_ops *skops)
127+
{
128+
__u64 sock_cookie = bpf_get_socket_cookie(skops);
129+
130+
if (skops->family != AF_INET && skops->family != AF_INET6)
131+
return 0;
132+
133+
switch (skops->op) {
134+
case BPF_SOCK_OPS_TCP_CONNECT_CB:
135+
skops_handle_kmesh_managed_process(skops);
136+
break;
137+
138+
case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
139+
if (!is_managed_by_kmesh(skops))
140+
break;
141+
observe_on_connect_established(skops->sk, sock_cookie, OUTBOUND);
142+
if (bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_STATE_CB_FLAG) != 0
143+
|| bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_RETRANS_CB_FLAG) != 0
144+
|| bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_RTT_CB_FLAG) != 0) {
145+
BPF_LOG(ERR, SOCKOPS, "set sockops cb failed!\n");
146+
}
147+
__u64 *current_sk = (__u64 *)skops->sk;
148+
struct bpf_sock_tuple *dst = bpf_map_lookup_elem(&map_of_orig_dst, current_sk);
149+
if (dst != NULL)
150+
enable_encoding_metadata(skops);
151+
break;
152+
153+
default:
154+
break;
155+
}
156+
return 0;
157+
}
158+
159+
SEC("sockops_passive")
160+
int sockops_passive_prog(struct bpf_sock_ops *skops)
161+
{
162+
__u64 sock_cookie = bpf_get_socket_cookie(skops);
163+
164+
if (skops->family != AF_INET && skops->family != AF_INET6)
165+
return 0;
166+
167+
switch (skops->op) {
168+
case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
169+
if (!is_managed_by_kmesh(skops) || skip_specific_probe(skops))
170+
break;
171+
observe_on_connect_established(skops->sk, sock_cookie, INBOUND);
172+
if (bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_STATE_CB_FLAG) != 0
173+
|| bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_RETRANS_CB_FLAG) != 0
174+
|| bpf_sock_ops_cb_flags_set(skops, BPF_SOCK_OPS_RTT_CB_FLAG) != 0) {
175+
BPF_LOG(ERR, SOCKOPS, "set sockops cb failed!\n");
176+
}
177+
auth_ip_tuple(skops);
178+
break;
179+
180+
default:
181+
break;
182+
}
183+
return 0;
184+
}
185+
186+
SEC("sockops_utils")
187+
int sockops_utils_prog(struct bpf_sock_ops *skops)
188+
{
189+
// Filter by IPv4 or IPv6
190+
if (skops->family != AF_INET && skops->family != AF_INET6)
191+
return 0;
192+
193+
switch (skops->op) {
194+
case BPF_SOCK_OPS_STATE_CB:
195+
if (skops->args[1] == BPF_TCP_CLOSE) {
196+
clean_auth_map(skops);
197+
clean_dstinfo_map(skops);
198+
}
199+
if (!is_managed_by_kmesh(skops))
200+
break;
201+
observe_on_status_change(skops->sk, skops->args[0]);
202+
break;
203+
204+
case BPF_SOCK_OPS_RETRANS_CB:
205+
if (!is_managed_by_kmesh(skops))
206+
break;
207+
observe_on_retransmit(skops->sk);
208+
break;
209+
210+
case BPF_SOCK_OPS_RTT_CB:
211+
if (!is_managed_by_kmesh(skops))
212+
break;
213+
observe_on_rtt(skops->sk);
214+
break;
215+
216+
default:
217+
break;
218+
}
219+
return 0;
220+
}
221+
222+
```
223+
224+
Sk_msg hook is triggered when the packet leaves the socket, we will be using sk_msg ebpf hook for refreshing sent bytes data, also we are triggering flush_conn function here to send the connection info to userspace using ringbuffer map.
225+
Updating sendmsg_prog func in send_msg.c
226+
227+
```
228+
SEC("sk_msg")
229+
int sendmsg_prog(struct sk_msg_md *msg)
230+
{
231+
__u32 off = 0;
232+
if (msg->family != AF_INET && msg->family != AF_INET6)
233+
return SK_PASS;
234+
235+
// encode org dst addr
236+
encode_metadata_org_dst_addr(msg, &off, (msg->family == AF_INET));
237+
238+
struct bpf_sock *sk = msg->sk;
239+
240+
if (sk) {
241+
if (is_managed_by_kmesh_skmsg(msg)) {
242+
observe_on_data(sk);
243+
}
244+
} else {
245+
BPF_LOG(ERR, KMESH, "sk_lookup success\n");
246+
}
247+
int key = 0;
248+
__u64 *last_time = bpf_map_lookup_elem(&tcp_conn_last_flush, &key);
249+
__u64 now = bpf_ktime_get_ns();
250+
251+
if (!last_time) {
252+
__u64 init_time = now;
253+
// Initialize last flush time if not set
254+
bpf_map_update_elem(&tcp_conn_last_flush, &key, &init_time, BPF_ANY);
255+
} else if ((now - *last_time) >= TIMER_INTERVAL_NS) {
256+
flush_tcp_conns();
257+
// Update last flush time
258+
bpf_map_update_elem(&tcp_conn_last_flush, &key, &now, BPF_ANY);
259+
}
260+
return SK_PASS;
261+
}
262+
263+
```
264+
265+
For refreshing the received bytes by a connection, we will attach a kprobe on tcp_rcv_established.
266+
Creating workload/kprobe.c
267+
```
268+
SEC("kprobe/tcp_rcv_established")
269+
int bpf_tcp_rcv_established(struct pt_regs *ctx) {
270+
271+
struct sk_buff *skb = (struct sk_buff *)PT_REGS_PARM2(ctx);
272+
struct bpf_sock *sk = skb->sk;
273+
if (sk) {
274+
if (is_managed_by_kmesh_skb(skb)) {
275+
observe_on_data(sk);
276+
}
277+
} else {
278+
BPF_LOG(ERR, KMESH, "sk_lookup success\n");
279+
}
280+
return 0;
281+
}
282+
```
283+
284+
We will update functions of tcp_probe.h to store and refresh the connection information on the hash map.
285+
286+
We will update the functions of metric.go for periodic updating the workload and service metrics, also we will create a new metric for long tcp connections.
287+
#### User Stories (Optional)
288+
289+
<!--
290+
Detail the things that people will be able to do if this KEP is implemented.
291+
Include as much detail as possible so that people can understand the "how" of
292+
the system. The goal here is to make this feel real for users without getting
293+
bogged down.
294+
-->
295+
296+
##### Story 1
297+
Workload and service prometheus metrics are updated periodically and when the connection is closed.
298+
299+
##### Story 2
300+
A new prometheus metric for long tcp connection which updates periodically.
301+
302+
#### Notes/Constraints/Caveats (Optional)
303+
304+
<!--
305+
What are the caveats to the proposal?
306+
What are some important details that didn't come across above?
307+
Go in to as much detail as necessary here.
308+
This might be a good place to talk about core concepts and how they relate.
309+
-->
310+
311+
#### Risks and Mitigations
312+
313+
<!--
314+
What are the risks of this proposal, and how do we mitigate?
315+
316+
How will security be reviewed, and by whom?
317+
318+
How will UX be reviewed, and by whom?
319+
320+
Consider including folks who also work outside the SIG or subproject.
321+
-->
322+
323+
324+
#### Test Plan
325+
326+
<!--
327+
**Note:** *Not required until targeted at a release.*
328+
329+
Consider the following in developing a test plan for this enhancement:
330+
- Will there be e2e and integration tests, in addition to unit tests?
331+
- How will it be tested in isolation vs with other components?
332+
333+
No need to outline all test cases, just the general strategy. Anything
334+
that would count as tricky in the implementation, and anything particularly
335+
challenging to test, should be called out.
336+
337+
-->
338+
339+
Updating bpf_test.go for testing the ebpf code written.
340+
Also updating metric_test.go for testing the metrics
341+
### Alternatives
342+
343+
<!--
344+
What other approaches did you consider, and why did you rule them out? These do
345+
not need to be as detailed as the proposal, but should include enough
346+
information to express the idea and why it was not acceptable.
347+
-->
348+
349+
<!--
350+
Note: This is a simplified version of kubernetes enhancement proposal template.
351+
https://github.com/kubernetes/enhancements/tree/3317d4cb548c396a430d1c1ac6625226018adf6a/keps/NNNN-kep-template
352+
-->
353+
354+
Creating a userspace proxy component instead of ebpf for collecting metrics.

0 commit comments

Comments
 (0)