feat: CPU startup boost in master #8813

kamarabbas99 · 2025-11-13T20:22:47Z

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

The CPU startup boost changes were done on experimental branch, moving this to master.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Users can configure a startupBoost policy in the VPA spec.

Which issue(s) this PR fixes:

#7862

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: (https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/7862-cpu-startup-boost#aep-7862-cpu-startup-boost)

Fix VPA startup boost validation error messages

Fix test failure after rebase

Add e2e tests for CPU startup boost

kamarabbas99 · 2025-11-13T20:23:39Z

/cc adrianmoisey omerap12

k8s-triage-robot · 2025-11-13T20:43:09Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

omerap12

Nice! we already approved this in previous PRs 🥳
/lgtm
/label tide/merge-method-squash

vertical-pod-autoscaler/docs/features.md

omerap12 · 2025-11-14T06:52:21Z

oh good catch
/lgtm cancel

Update VPA version for startupboost feature

adrianmoisey · 2025-11-16T13:20:25Z

I'm good with this, thanks for doing it!
/lgtm

omerap12 · 2025-11-16T17:45:02Z

/approve

soltysh · 2025-11-19T10:25:28Z

/label api-review

soltysh

I left there a few API related questions, but I didn't get too deep into the logic itself, other than just the validation bits.

vertical-pod-autoscaler/pkg/admission-controller/main.go

vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go

vertical-pod-autoscaler/pkg/utils/annotations/vpa_cpu_boost.go

soltysh · 2025-11-19T12:14:00Z

vertical-pod-autoscaler/pkg/admission-controller/resource/pod/patch/resource_updates_test.go

+			maxAllowedCpu:      resource.QuantityValue{},
+			featureGateEnabled: true,
+			expectError:        fmt.Errorf("boost factor must be >= 1"),
+		},


This is the only error case you're testing, I'd add additional ones for type and factor|quantity specified, and probably also one for invalid type, even though that should be caught at the api server validation, but it doesn't hurt to have a test case covering that.

The other thing, the WithCPUStartupBoost method only modifies the VPA.spec, but doesn't change the container level StartupBoost, it would be good to also cover that case in test, to ensure the latter takes priority.

What is the relationship between the two, I missed that in the AEP-7862 and the code seems to be working on one or the other, depending on the location?

Added some more error cases in #8828. should it error out when both quantity and Factor are set ? Currently it doesnt.

contianer level startup boost has higher priority, added some test cases to cover that.

Great, just tagged #8828, but it would be also nice to update AEP-7862 with this information.

Actually AEP does mention it.

The new StartupBoost parameter will be added to both: [VerticalPodAutoscalerSpec]: Will allow users to specify the default CPU startup boost for all containers of the pod targeted by the VPA object. [ContainerResourcePolicy]: Will allow users to optionally customize the startup boost behavior for individual containers.

CPU startup boost: add tests

k8s-ci-robot · 2025-11-21T13:04:40Z

New changes are detected. LGTM label has been removed.

alextarasov-spot · 2025-11-23T13:12:12Z

Hello, I’m not sure if this is the right place to report this issue, so I’ll share it here. If there’s a more appropriate channel, please let me know.

I cloned the experimental-cpu-boost-v2 branch to test the new feature, and I’m concerned about the following behavior:

Boost works for one replica
Unboost does not work with a single replica, meaning the pod stays stuck with the boosted CPU request.

1 pods_inplace_restriction.go:112] "Checking if pod can be unboosted" pod="vpa-test/nginx-all-containers-5d649d7c56-t7bqj" durationPassed=true hasAnnotation=true
1 pods_restriction_factory.go:212] "Too few replicas" kind="ReplicaSet" object="vpa-test/nginx-all-containers-5d649d7c56" livePods=1 requiredPods=2 globalMinReplicas=2

The Minimum number of replicas to perform an update is 2 by default

autoscaler/vertical-pod-autoscaler/pkg/updater/main.go

Line 61 in 4b40a55

minReplicas = flag.Int("min-replicas", 2,

This can be confusing, as boosting is allowed with a single replica, but unboosting will never occur in that case.

kamarabbas99 · 2025-11-24T19:24:48Z

@alextarasov-spot thanks for catching this!
sent #8854 to address this.

omerap12 · 2025-11-25T07:52:59Z

@alextarasov-spot , great catch! thanks for this!

Allow unboost even if pod replicas less than "min-replicas" flag

k8s-ci-robot · 2025-11-25T10:54:54Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kamarabbas99, omerap12
Once this PR has been reviewed and has the lgtm label, please ask for approval from adrianmoisey. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

~~vertical-pod-autoscaler/OWNERS~~ [omerap12]
vertical-pod-autoscaler/enhancements/OWNERS
~~vertical-pod-autoscaler/pkg/apis/OWNERS~~ [omerap12]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alextarasov-spot · 2025-11-25T12:30:51Z

@kamarabbas99 @omerap12 thank you guys for the quick reaction!

If you don't mind, I will share another issue I found.

Input data

resources request

resources:
            requests:
              cpu: "100m"

VPA definition

spec:
  resourcePolicy:
    containerPolicies:
    - containerName: nginx
      controlledValues: RequestsAndLimits
  startupBoost:
    cpu:
      duration: 15s
      quantity: 200m
      type: Quantity
  ......
  ......
  updatePolicy:
    updateMode: InPlaceOrRecreate
status:
  recommendation:
    containerRecommendations:
    - containerName: nginx
      target:
        cpu: 105m

If there are no Limits defined
AND the VPA’s controlledValues is RequestsAndLimits - the vpa-admission-controller calculates:
- the limit as {Boost Quantity} + {Deployment Request} = 300m
- the request as {VPA Target} + {Boost Quantity} = 305m
here it calculates the Limit

autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/pod/patch/resource_updates.go

Line 254 in 4b40a55

boostedLimit, err := c.calculateBoostedCPU(recommendedLimit, originalLimit, startupBoostPolicy)
if the limit is not set, it takes the original CPU request

autoscaler/vertical-pod-autoscaler/pkg/admission-controller/resource/pod/patch/resource_updates.go

Line 208 in 4b40a55

if baseCPU.IsZero() {

Which leads to the following error:

Error creating: Pod "nginx-with-probes-6fb6445758-ldw5m"
  is invalid: spec.containers[0].resources.requests: Invalid value: "305m": must be
  less than or equal to cpu limit of 300m'

kamarabbas99 · 2025-11-25T15:47:59Z

@alextarasov-spot did you manually set the target? Because I thought when controlledValues is RequestsAndLimits, the recommender will set limits as well.

omerap12 · 2025-11-25T18:20:25Z

@kamarabbas99 I think it would be helpful to document this use case in the AEP (where no limits are defined for a pod).
In situations where no original limit is set, shouldn’t the limit be based on the boosted request when no original limit exists?

omerap12 · 2025-11-25T18:56:36Z

Having no limit does make sense :) as long as we document this.

alextarasov-spot · 2025-11-26T17:00:28Z

@kamarabbas99

@alextarasov-spot did you manually set the target? Because I thought when controlledValues is RequestsAndLimits, the recommender will set limits as well.

Yes, I manually configured the VPA target. If RequestsAndLimits is set, the VPA admission controller will calculate the limit, but it doesn't factor in the boosted value. I think that when there are no original limits set AND a boost is configured, AND the controlledValues is RequestsAndLimits, the admission controller should calculate the limit based on the boosted value and not on the original request value

I think that @omerap12 meant the same:

shouldn’t the limit be based on the boosted request when no original limit exists?

kamarabbas99 · 2025-11-26T18:33:22Z

@alextarasov-spot I am addressing this in #8863 PTAL!

kamarabbas99 and others added 8 commits November 11, 2025 10:34

Introduce API changes and fetaure gate CPU startup boost

c1a42db

Apply CPU startup boost in admission controller if its set

0554a5c

Fix VPA startup boost validation error messages

30aebbe

Fix VPA startup boost validation error messages

Make changes to updater to add the unboosting logic

b25e43c

Fix test failure after rebase

3512961

Merge pull request #8797 from kamarabbas99/clean_fix_for_experimental

18d8efd

Fix test failure after rebase

Add e2e tests for CPU startup boost

202579b

Merge pull request #8672 from kamarabbas99/feature-cpu-e2e

cd061df

Add e2e tests for CPU startup boost

k8s-ci-robot requested review from andrewsykim and kwiesmueller November 13, 2025 20:23

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 13, 2025

k8s-ci-robot requested review from adrianmoisey and omerap12 November 13, 2025 20:23

omerap12 reviewed Nov 13, 2025

View reviewed changes

k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Nov 13, 2025

k8s-ci-robot assigned omerap12 Nov 13, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 13, 2025

adrianmoisey reviewed Nov 14, 2025

View reviewed changes

vertical-pod-autoscaler/docs/features.md Outdated Show resolved Hide resolved

adrianmoisey reviewed Nov 14, 2025

View reviewed changes

vertical-pod-autoscaler/docs/features.md Outdated Show resolved Hide resolved

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2025

kamarabbas99 mentioned this pull request Nov 14, 2025

Update VPA version for startupboost feature #8815

Merged

Merge pull request #8815 from kamarabbas99/experimental-cpu-boost-v2

4b40a55

Update VPA version for startupboost feature

k8s-ci-robot assigned adrianmoisey Nov 16, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2025

k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Nov 19, 2025

github-project-automation bot added this to API Reviews Nov 19, 2025

soltysh moved this to Backlog in API Reviews Nov 19, 2025

soltysh reviewed Nov 19, 2025

View reviewed changes

kamarabbas99 added a commit to kamarabbas99/autoscaler that referenced this pull request Nov 19, 2025

Address comments on kubernetes#8813

9cbd887

kamarabbas99 mentioned this pull request Nov 19, 2025

CPU startup boost: add tests #8828

Merged

kamarabbas99 and others added 2 commits November 19, 2025 18:42

Address comments on #8813

1c2f5c0

Merge pull request #8828 from kamarabbas99/experimental-cpu-boost-v2

d7c901f

CPU startup boost: add tests

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 21, 2025

Allow unboost even if pod replicas less than "min-replicas" flag

be67e64

Merge pull request #8854 from kamarabbas99/fix-minreplicas-withboost

dcde538

Allow unboost even if pod replicas less than "min-replicas" flag

kamarabbas99 mentioned this pull request Nov 25, 2025

Fix startup boost workflow when limit is not set #8863

Open

feat: CPU startup boost in master #8813

Are you sure you want to change the base?

feat: CPU startup boost in master #8813

Conversation

kamarabbas99 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Which issue(s) this PR fixes:

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

kamarabbas99 commented Nov 13, 2025

Uh oh!

k8s-triage-robot commented Nov 13, 2025

Uh oh!

omerap12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

omerap12 commented Nov 14, 2025

Uh oh!

adrianmoisey commented Nov 16, 2025

Uh oh!

omerap12 commented Nov 16, 2025

Uh oh!

soltysh commented Nov 19, 2025

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soltysh Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

soltysh Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

soltysh Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

kamarabbas99 Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soltysh Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

kamarabbas99 Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Nov 21, 2025

Uh oh!

alextarasov-spot commented Nov 23, 2025

Uh oh!

kamarabbas99 commented Nov 24, 2025

Uh oh!

omerap12 commented Nov 25, 2025

Uh oh!

k8s-ci-robot commented Nov 25, 2025

Uh oh!

alextarasov-spot commented Nov 25, 2025

Input data

Uh oh!

kamarabbas99 commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omerap12 commented Nov 25, 2025

Uh oh!

omerap12 commented Nov 25, 2025

Uh oh!

alextarasov-spot commented Nov 26, 2025

Uh oh!

kamarabbas99 commented Nov 26, 2025

kamarabbas99 commented Nov 13, 2025 •

edited

Loading

kamarabbas99 Nov 19, 2025 •

edited

Loading

kamarabbas99 Nov 21, 2025 •

edited

Loading

kamarabbas99 commented Nov 25, 2025 •

edited

Loading