1616 - [ Kube-Proxy] ( #kube-proxy )
1717 - [ EndpointSlice Controller] ( #endpointslice-controller )
1818- [ Heuristics] ( #heuristics )
19- - [ Proportional CPU Heuristic] ( #proportional-cpu-heuristic )
20- - [ Assumptions] ( #assumptions )
21- - [ Identifying Zones] ( #identifying-zones )
19+ - [ Identifying Zones] ( #identifying-zones )
2220 - [ Excluding Control Plane Nodes] ( #excluding-control-plane-nodes )
23- - [ Example] ( #example )
2421 - [ Overload] ( #overload )
2522 - [ Handling Node Updates] ( #handling-node-updates )
23+ - [ Proportional CPU Heuristic] ( #proportional-cpu-heuristic )
24+ - [ Assumptions] ( #assumptions )
25+ - [ Example] ( #example )
26+ - [ PreferZone Heuristic] ( #preferzone-heuristic )
27+ - [ Assumptions] ( #assumptions-1 )
28+ - [ Example] ( #example-1 )
2629 - [ Additional Heuristics] ( #additional-heuristics )
2730 - [ Future Expansion] ( #future-expansion )
2831 - [ Test Plan] ( #test-plan )
@@ -295,7 +298,7 @@ implemented directly by kube-proxy.
295298# ## EndpointSlice Controller
296299
297300When the `TopologyAwareHints` feature gate is enabled and the annotation is set
298- to `Auto` or `ProportionalByCore ` for a Service, the EndpointSlice controller
301+ to `Auto` or `ProportionalZoneCPU ` for a Service, the EndpointSlice controller
299302will add hints to EndpointSlices. These hints will indicate where an endpoint
300303should be consumed by proxy implementations to enable topology aware routing.
301304
@@ -306,27 +309,15 @@ This KEP starts with the following heuristics:
306309| Heuristic Name | Description |
307310|-|-|
308311| Auto | EndpointSlice controller and/or underlying dataplane can choose the heuristic used. |
309- | ProportionalByCore | Endpoints will be allocated to each zone proportionally, based on the allocatable Node CPU cores in each zone. |
312+ | ProportionalZoneCPU | Endpoints will be allocated to each zone proportionally, based on the allocatable Node CPU cores in each zone. |
313+ | PreferZone | Hints are always populated to represent the zone the endpoint is in. |
310314
311315In the future, additional heuristics may be added. Until that point, "Auto" will
312316be the only configurable value. In most clusters, that will translate to
313- ` ProportionalByCore ` unless the underlying dataplane has a better approach
317+ ` ProportionalZoneCPU ` unless the underlying dataplane has a better approach
314318available.
315319
316- # ## Proportional CPU Heuristic
317- # ### Assumptions
318-
319- - Incoming traffic is proportional to the number of allocatable CPU cores in a
320- zone. Although this is an imperfect metric, it is the best available way of
321- predicting how much traffic will be received in a zone. If we are unable to
322- derive the number of allocatable cores in a zone we will fall back to the
323- number of nodes in that zone.
324- - Service capacity is proportional to the number of endpoints in a zone. This
325- assumes that each endpoint has equivalent capacity. Although this is not
326- always true, it usually is. We can explore ways to deal with variable capacity
327- endpoints in the future.
328-
329- # ### Identifying Zones
320+ # ## Identifying Zones
330321
331322The EndpointSlice controller reads the standard `topology.kubernetes.io/zone`
332323label on Nodes to determine which zone a Pod is running in. Kube-Proxy would be
@@ -340,23 +331,6 @@ calculating allocatable cores in a zone:
340331* `node-role.kubernetes.io/control-plane`
341332* `node-role.kubernetes.io/master`
342333
343- # ### Example
344-
345- zone-a : 20 CPU cores
346- zone-b : 16 CPU cores
347- zone-c : 14 CPU cores
348-
349- In this scenario, the following proportion of endpoints would be allocated for
350- each Service :
351-
352- zone-a : 40%
353- zone-b : 32%
354- zone-c : 28%
355-
356- When allocating endpoints to meet this distribution, keeping endpoints in the
357- same zone will be prioritized. When same-zone endpoints are exhausted, endpoints
358- will be taken from zones that have excess capacity.
359-
360334# ### Overload
361335
362336Overload is a key concept for this proposal. This occurs when there are less
@@ -393,6 +367,57 @@ of the following scenarios:
3933672. A new Node results in a Service that is able to achieve an endpoint
394368 distribution below 20% for the first time.
395369
370+ # ## Proportional CPU Heuristic
371+
372+ # ### Assumptions
373+
374+ - Incoming traffic is proportional to the number of allocatable CPU cores in a
375+ zone. Although this is an imperfect metric, it is the best available way of
376+ predicting how much traffic will be received in a zone. If we are unable to
377+ derive the number of allocatable cores in a zone we will fall back to the
378+ number of nodes in that zone.
379+ - Service capacity is proportional to the number of endpoints in a zone. This
380+ assumes that each endpoint has equivalent capacity. Although this is not
381+ always true, it usually is. We can explore ways to deal with variable capacity
382+ endpoints in the future.
383+ # ### Example
384+
385+ zone-a : 20 CPU cores
386+ zone-b : 16 CPU cores
387+ zone-c : 14 CPU cores
388+
389+ In this scenario, the following proportion of endpoints would be allocated for
390+ each Service :
391+
392+ zone-a : 40%
393+ zone-b : 32%
394+ zone-c : 28%
395+
396+ When allocating endpoints to meet this distribution, keeping endpoints in the
397+ same zone will be prioritized. When same-zone endpoints are exhausted, endpoints
398+ will be taken from zones that have excess capacity.
399+
400+ # ## PreferZone Heuristic
401+
402+ # ### Assumptions
403+
404+ - Endpoints are distributed per zone proportionally to the expected traffic capacity.
405+
406+ This heuristic will route traffic to the endpoints existing in the zone without any overflow.
407+ Dataplanes will fall back to cluster-wide routing if there are no endpoints with hints for the
408+ zone the dataplane is running in.
409+ There is risk of blackholing traffic or traffic imbalance if the endpoint distribution is incorrect.
410+
411+ # ### Example
412+
413+ zone-a : 2 endpoints
414+ zone-b : 0 endpoint
415+ zone-c : 3 endpoints
416+
417+ In this scenario, traffic generated in zona-a or zone-c will be routed only to the endpoints existing
418+ in their corresponding zone. Traffic from zone-b, since does not have any endpoint, will fall back to
419+ cluster wide routing and will be routed to endpoints in zone-a and zone-c.
420+
396421# ## Additional Heuristics
397422To enable additional heuristics to be added in the future, we will :
398423
0 commit comments