Skip to content

Conversation

@rajathagasthya
Copy link
Contributor

@rajathagasthya rajathagasthya commented Nov 17, 2025

This change enables NodeFeature API in GFD. This results in GFD creating
in a new NodeFeature CR on startup.

This commit also removes the redundant enableNodeFeatureApi helm value
NodeFeature is now enabled by default in NFD since it's v0.17.0 release.
The k8s-device-plugin helm chart has been using NFD v0.17.3 subchart
since k8s-device-plugin's v0.18.0 release, so NodeFeature API has been
enabled by default since then and this helm value has had no effect.

Verification steps:

  1. Deploy gpu-operator that uses GFD image built by this PR.
  2. Verify new NodeFeature CR is created.
  3. Verify no file is added by GFD in /etc/kubernetes/node-feature-discovery/features.d/.

@rajathagasthya rajathagasthya marked this pull request as ready for review November 17, 2025 22:15
nfd:
nameOverride: node-feature-discovery
enableNodeFeatureApi: false
enableNodeFeatureApi: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the behaviour differ between upgrades and new installations due to this toggle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should automatically upgrade to using NFD when a new chart and device plugin image is used, so I don't think the behavior differs. Is there a particular scenario you were thinking of?

Copy link
Contributor

@cdesiniotis cdesiniotis Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the answer is yes, but just to confirm -- if we upgrade the device-plugin helm chart, does the NFD subchart also get upgraded and the value of this field, enableNodeFeatureApi, gets toggled from false to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar Do you have any thoughts on if we should keep this value enabled by default?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajathagasthya have you tried a helm chart upgrade? What behavior have you observed?

Copy link
Contributor Author

@rajathagasthya rajathagasthya Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elezar The enableNodeFeatureApi variable is actually redundant and can be removed. The reason is that v0.18.0 of k8s-device-plugin upgraded NFD to v0.17.3, which enables NodeFeature API by default. So this behavior was already changed when older charts upgraded to v0.18.0.

I've verified this by doing the following:

  1. Install v0.17.4 of k8s-device-plugin. This installs NFD v0.15.3 subchart.
  2. No NodeFeature CRs are created since that version of NFD doesn't enable NodeFeature API by default (controlled by nfd.enableNodeFeatureApi in values.yaml).
  3. Upgrade the chart to v0.18.0 of k8s-device-plugin. This installs NFD v0.17.3 subchart.
  4. A NodeFeature CR is created by default since it's now enabled in NFD.

Essentially, nfd.enableNodeFeatureApi has been a no-op since the last release of k8s-device-plugin and can be removed.

Bonus: I've also verified that when I upgrade the chart to current changes in this, it creates two NodeFeature CRs — one created by NFD and another by GFD.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed nfd.enableNodeFeatureApi value. PTAL.

This change enables NodeFeature API in GFD. This results in GFD creating
in a new NodeFeature CR on startup.

This commit also removes the redundant `enableNodeFeatureApi` helm value
NodeFeature is now enabled by default in NFD since it's v0.17.0 release.
The k8s-device-plugin helm chart has been using NFD v0.17.3 subchart
since k8s-device-plugin's v0.18.0 release, so NodeFeature API has been
enabled by default since then and this helm value has had no effect.

Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
@rajathagasthya rajathagasthya force-pushed the gfd-use-node-feature-api branch from 770eefe to 1cd8284 Compare November 25, 2025 23:32
EnvVars: []string{"GFD_CONFIG_FILE", "CONFIG_FILE"},
},
&cli.BoolFlag{
Name: "use-node-feature-api",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate this flag too for future removal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think is sane to do, in a couple more releases, NFD will start deprecating the old method.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say a depreciation path will involve not fully removing enableNodeFeatureApi from the Helm charts but adding a logic to helm helper to print a warning mentioning that the new default and the flag will go away soon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed enableNodeFeatureApi from the helm chart because that value is not functional right now. And giving users the option to set enableNodeFeatureApi=false will actually leave the deployment broken because it doesn't create the necessary role/rolebinding/serviceaccount objects in the chart. So I'm not sure adding a warning there is of much help.

Let me know what you think!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with removing it, NFD has been with NodeFeatureApi enabled by default for a long time now, most users must be on it already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants