Skip to content

Conversation

@gnufied
Copy link
Member

@gnufied gnufied commented Oct 31, 2025

xref - kubernetes/enhancements#5030

This makes CAS aware of volume limits on new nodes when scaling for pending pods.

I have tested the proposed changes and they work fine. I have tested and verified for both scaling from 0 and scaling from existing nodes scenario and both work.

For example, given a AWS Openshift cluster after these changes, I could observe that it spins correct number of machines that count volumes too. To test this, I created a single pod that consumes 20 volumes. Usually 5 such pods can fit on a node easily, but because of number of volumes - we must spin 5 worker nodes at minimum. Given scaling from 0 scenario and the fact is - I already had 3 workers, I could see autoscaler created right number of workers from get-go:

foobar-dec10-ds-east-1d-6xfvq   Provisioned   m6i.xlarge   us-east-1   us-east-1d   2m17s                                  
foobar-dec10-ds-east-1d-7rszh   Provisioned   m6i.xlarge   us-east-1   us-east-1d   2m18s

I have tested this with higher number of nodes too, so we know it works as expected. Before my change, CAS will create just 1 node to accommodate such pod, because it thinks all 5 pods will fit on one node.

Add support for counting CSI volume limits when scaling nodes

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area area/cluster-autoscaler and removed do-not-merge/needs-area labels Oct 31, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gnufied
Once this PR has been reviewed and has the lgtm label, please assign towca for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 31, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 4, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
@towca
Copy link
Collaborator

towca commented Nov 7, 2025

@mtrqq Could you do a first-pass review here? This is supposed to follow the way that DRA integration was implemented in CA as much as possible, but instead of the DRA objects we have CSINode.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 10, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2025
@mtrqq
Copy link
Contributor

mtrqq commented Nov 14, 2025

@towca Certainly, I'm on my way reading through the KEP.

@gnufied is it PR still WIP?

@k8s-ci-robot k8s-ci-robot added area/provider/cluster-api Issues or PRs related to Cluster API provider and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Nov 18, 2025
@gnufied
Copy link
Member Author

gnufied commented Nov 18, 2025

@mtrqq yes, the PR is ready for review. It was marked WIP, because I had to keep rebasing it and I had to make bunch of changes because of latest kube rebase.

Even the tests that are still failing are most likely unrelated to this change and are happening because of kube version bump.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2025
@gnufied gnufied changed the title WIP - Add csinode limit awareness in cluster-autoscaler Add csinode limit awareness in cluster-autoscaler Nov 20, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 20, 2025
return nil, err
}

wrappedNodeInfo := framework.WrapSchedulerNodeInfo(schedNodeInfo, nil, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me sometime to understand that you've changed the approach from wrapping the node info object inside the CSI/DRA snapshots to mutating the node info object inplace, this somewhat aligns with the approach to reduce the memory allocations when handling node infos, I like this part.

But can we come up with a consistent name for this method across snapshots? For example - AugmentNodeInfo

"k8s.io/klog/v2"
fwk "k8s.io/kube-scheduler/framework"
schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework"
intreeschedulerframework "k8s.io/kubernetes/pkg/scheduler/framework"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the import name here? The previous name is consistent across the other parts of the codebase and I don't see how the new name is better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, they changed some of the interfaces in kube-scheduler and I renamed this import because it felt more consistent.

But, Please ignore these renaming for now, because once I rebase my PR with #8827 then these renamings will be gone.

"k8s.io/klog/v2"
fwk "k8s.io/kube-scheduler/framework"
schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework"
intreeschedulerframework "k8s.io/kubernetes/pkg/scheduler/framework"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the import name here? The previous name is consistent across the other parts of the codebase and I don't see how the new name is better

}

// AddCSINodes adds a list of CSI nodes to the snapshot.
func (s *Snapshot) AddCSINodes(csiNodes []*storagev1.CSINode) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse AddCSINode here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

replace github.com/rancher/go-rancher => github.com/rancher/go-rancher v0.1.0

replace k8s.io/api => k8s.io/api v0.34.1
replace k8s.io/api => github.com/kubernetes/api v0.0.0-20251107002836-f1737241c064
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind adding github.com to all k8s dependencies?

// if cloudprovider does not provide CSI related stuff, then we can skip the CSI readiness check
if nodeInfo.CSINode == nil {
newReadyNodes = append(newReadyNodes, node)
klog.Warningf("No CSI node found for node %s, Skipping CSI readiness check and keeping node in ready list.", node.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning level seems to be extreme due to potential noise level in the logs, or do we anticipate all the nodes to have the matching CsiNode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all nodes should have a matching CSINode object. Without a CSINode object, kubelet will not event report node as ready.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2025
@k8s-ci-robot k8s-ci-robot added area/provider/azure Issues or PRs related to azure provider area/provider/hetzner Issues or PRs related to Hetzner provider labels Dec 3, 2025
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Dec 3, 2025

@gnufied: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-autoscaler-e2e-azure-master e9bbe15 link false /test pull-cluster-autoscaler-e2e-azure-master

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 10, 2025
@gnufied gnufied force-pushed the add-csinode-limit branch 4 times, most recently from 7300575 to 64598e8 Compare December 10, 2025 19:21
var options []expander.Option

// This code here runs a simulation to see which pods can be scheduled on which node groups.
// TODO: Fix bug with CSI node not being added to the simulation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create an issue for this and paste the issue link in here if we plan to address this as a follow-up item?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this was already resolved. It is yet another artifact of leftover stuff. The whole mechanism doesn't work if CSINode is not in snapshot

if aErr != nil {
return status.UpdateScaleUpError(&status.ScaleUpStatus{}, aErr.AddPrefix("could not get upcoming nodes: "))
}
klog.V(4).Infof("Upcoming %d nodes", len(upcomingNodes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible there are users who derive value from this and are running at v=4, can we restore it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cluster-autoscaler area/provider/azure Issues or PRs related to azure provider area/provider/cluster-api Issues or PRs related to Cluster API provider area/provider/hetzner Issues or PRs related to Hetzner provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants