You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-network/4963-kube-proxy-services-acceleration/README.md
+28-10Lines changed: 28 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,15 +74,20 @@ This KEP proposes utilizing the flowtable infrastructure within the Linux kernel
74
74
75
75
## Motivation
76
76
77
-
Kube-proxy manages Service traffic by manipulating iptables/nftables rules. This approach can introduce performance overhead, particularly for services with high throughput or a large number of connections. The kernel's flowtables offer a more efficient alternative for handling established connections, bypassing the standard netfilter processing pipeline.
77
+
Every packet entering the Linux kernel is evaluated against all rules attached to the netfilter hooks, even for established connections. These rules may be added by CNIs, system administrators, firewalls, or kube-proxy, and together they define how packets are filtered, routed, or rewritten. As a result, packets continue to traverse the full netfilter processing path, which can add unnecessary overhead for long-lived or high-throughput connections.
78
+
79
+
A connection becomes established only after the initial packets successfully pass through all applicable rules without being dropped or rejected. Once established, packets associated with a Kubernetes service can be offloaded to the kernel fast path using flowtables. This allows subsequent service packets to bypass the full netfilter stack, accelerating kube-proxy traffic and reducing CPU usage.
80
+
78
81
79
82
### Goals
80
83
81
-
- Provide an option for kube-proxy users to enable Service traffic acceleration.
84
+
- Provide an option for kube-proxy users to enable traffic acceleration for TCP and UDP services.
82
85
83
86
### Non-Goals
84
87
85
88
- Separation of Concerns: Kube-proxy's primary responsibility is to manage Service traffic. Extending the flowtable offloading functionality to non-Service traffic will potentially introduce unintended side effects. It's better to keep the feature focused on its core purpose.
89
+
- Supporting service traffic acceleration for iptables and ipvs backends.
90
+
- Supporting service traffic acceleration for SCTP services.
86
91
87
92
## Proposal
88
93
@@ -155,14 +160,13 @@ As a cluster administrator managing a cluster where services typically handle sm
155
160
156
161
### Risks and Mitigations
157
162
158
-
Once the network traffic moves to the fastpath it completely bypass the kernel stack, so
159
-
any other network applications that depend on the packets going through the network stack (monitoring per example) will not be able to see the connection details. The feature will only apply the fastpath based on a defined threshold, that will also allow to disable the feature.
163
+
Moving network traffic to the fastpath causes packets to bypass the standard netfilter hooks after the ingress hook. Flowtables operate at the ingress hook, and packets still traverse taps, so tools like tcpdump and Wireshark will continue to observe traffic. However, any applications that rely on hooks or rules evaluated after the ingress hook may not observe or process these packets as expected. To mitigate this, fastpath offload will be applied selectively based on a configurable threshold, and users will have the option to disable the feature entirely.
160
164
161
165
Flowtables netfilter infrastructure is not well documented and we need to validate assumptions to avoid unsupported or suboptimal configurations. Establishing good relations and involve netfilter maintainers in the design will mitigate these possible problems.
162
166
163
167
## Design Details
164
168
165
-
This feature will only work with kube-proxy nftables mode. We will add a new configuration option to kube-proxy to enable Service traffic offload based on a number of packets threshold per connection.
169
+
This feature will only work with kube-proxy nftables mode for TCP and UDP traffic. We will add a new configuration option to kube-proxy to enable Service traffic offload based on a number of packets threshold per connection.
166
170
167
171
The packet threshold approach offers several advantages over the [alternatives](#alternatives):
168
172
@@ -288,22 +292,35 @@ Users will have the ability to adjust the --offload-packet-threshold value or co
288
292
289
293
Kube-proxy will create a `flowtable` in the kube-proxy table with the name `kube-proxy-flowtable` and will monitor the network interfaces in the node to populate the `flowtable` with the interfaces on the Node.
290
294
291
-
Kube-proxy will insert a rule to offload all Services established traffic in the `filter-forward` chain:
295
+
Kube-proxy will maintain nftable sets for tracking active services insert a rule to offload all established traffic of active services in the `filter-forward` chain:
292
296
293
297
```go
294
298
// Offload the connection after the defined number of packets
295
299
if proxier.fastpathPacketThreshold > 0 {
296
300
tx.Add(&knftables.Flowtable{
297
301
Name: serviceFlowTable,
298
302
})
303
+
304
+
// For ClusterIP, LoadBalancer and ExternalIP Services.
299
305
tx.Add(&knftables.Rule{
300
306
Chain: filterForwardChain,
301
307
Rule: knftables.Concat(
302
-
"ct original", ipX, "daddr", "@", clusterIPsSet,
303
-
"ct packets >", proxier.fastpathPacketThreshold,
308
+
"ct packets >", proxier.fastpathPacketThreshold,
309
+
"ct original", ipX, "daddr", ". ct original proto‑dst", "@", serviceIPPortSet,
304
310
"flow offload", "@", serviceFlowTable,
305
311
),
306
312
})
313
+
314
+
// For NodePort Services.
315
+
tx.Add(&knftables.Rule{
316
+
Chain: filterForwardChain,
317
+
Rule: knftables.Concat(
318
+
"ct packets >", proxier.fastpathPacketThreshold,
319
+
"ct original", ipX, "daddr", "@", nodeport-ips,
320
+
"ct original proto-dst", "@", serviceNodePortSet,
321
+
"flow offload", "@", serviceFlowTable,
322
+
),
323
+
})
307
324
}
308
325
```
309
326
@@ -317,9 +334,10 @@ to implement this enhancement.
317
334
318
335
##### Unit tests
319
336
320
-
To be added
337
+
There will addition of new tests and modification of existing ones in the following packages:
0 commit comments