-
Notifications
You must be signed in to change notification settings - Fork 757
Enable NodeFeature API by default in GFD #1504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| nfd: | ||
| nameOverride: node-feature-discovery | ||
| enableNodeFeatureApi: false | ||
| enableNodeFeatureApi: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the behaviour differ between upgrades and new installations due to this toggle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should automatically upgrade to using NFD when a new chart and device plugin image is used, so I don't think the behavior differs. Is there a particular scenario you were thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the answer is yes, but just to confirm -- if we upgrade the device-plugin helm chart, does the NFD subchart also get upgraded and the value of this field, enableNodeFeatureApi, gets toggled from false to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar Do you have any thoughts on if we should keep this value enabled by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajathagasthya have you tried a helm chart upgrade? What behavior have you observed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elezar The enableNodeFeatureApi variable is actually redundant and can be removed. The reason is that v0.18.0 of k8s-device-plugin upgraded NFD to v0.17.3, which enables NodeFeature API by default. So this behavior was already changed when older charts upgraded to v0.18.0.
I've verified this by doing the following:
- Install
v0.17.4ofk8s-device-plugin. This installs NFDv0.15.3subchart. - No
NodeFeatureCRs are created since that version of NFD doesn't enableNodeFeatureAPI by default (controlled bynfd.enableNodeFeatureApiinvalues.yaml). - Upgrade the chart to
v0.18.0ofk8s-device-plugin. This installs NFDv0.17.3subchart. - A
NodeFeatureCR is created by default since it's now enabled in NFD.
Essentially, nfd.enableNodeFeatureApi has been a no-op since the last release of k8s-device-plugin and can be removed.
Bonus: I've also verified that when I upgrade the chart to current changes in this, it creates two NodeFeature CRs — one created by NFD and another by GFD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed nfd.enableNodeFeatureApi value. PTAL.
This change enables NodeFeature API in GFD. This results in GFD creating in a new NodeFeature CR on startup. This commit also removes the redundant `enableNodeFeatureApi` helm value NodeFeature is now enabled by default in NFD since it's v0.17.0 release. The k8s-device-plugin helm chart has been using NFD v0.17.3 subchart since k8s-device-plugin's v0.18.0 release, so NodeFeature API has been enabled by default since then and this helm value has had no effect. Signed-off-by: Rajath Agasthya <ragasthya@nvidia.com>
770eefe to
1cd8284
Compare
| EnvVars: []string{"GFD_CONFIG_FILE", "CONFIG_FILE"}, | ||
| }, | ||
| &cli.BoolFlag{ | ||
| Name: "use-node-feature-api", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we deprecate this flag too for future removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is sane to do, in a couple more releases, NFD will start deprecating the old method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say a depreciation path will involve not fully removing enableNodeFeatureApi from the Helm charts but adding a logic to helm helper to print a warning mentioning that the new default and the flag will go away soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed enableNodeFeatureApi from the helm chart because that value is not functional right now. And giving users the option to set enableNodeFeatureApi=false will actually leave the deployment broken because it doesn't create the necessary role/rolebinding/serviceaccount objects in the chart. So I'm not sure adding a warning there is of much help.
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with removing it, NFD has been with NodeFeatureApi enabled by default for a long time now, most users must be on it already.
This change enables NodeFeature API in GFD. This results in GFD creating
in a new NodeFeature CR on startup.
This commit also removes the redundant
enableNodeFeatureApihelm valueNodeFeature is now enabled by default in NFD since it's v0.17.0 release.
The k8s-device-plugin helm chart has been using NFD v0.17.3 subchart
since k8s-device-plugin's v0.18.0 release, so NodeFeature API has been
enabled by default since then and this helm value has had no effect.
Verification steps:
NodeFeatureCR is created./etc/kubernetes/node-feature-discovery/features.d/.