-
Notifications
You must be signed in to change notification settings - Fork 118
CNF-20404: DRA: disable Kubelet resources and topology managers #1445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Tal-or The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
ffromani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the general direction LGTM
pkg/performanceprofile/controller/performanceprofile/components/profile/profile.go
Outdated
Show resolved
Hide resolved
23baba8 to
3503930
Compare
|
@Tal-or: This pull request references CNF-20404 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
1 similar comment
|
/retest |
swatisehgal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall direction of selectively disabling kubelet resource and topology managers when DRA is in control makes sense to me.
| // PerformanceProfileDRAResourceManagementAnnotation signal the operator to disable KubeletConfig | ||
| // topology managers (CPU Manager, Memory Manager) configurations | ||
| // that conflict with the DRA feature, and stop reconciling the PerformanceProfile. | ||
| const PerformanceProfileDRAResourceManagementAnnotation = "performance.openshift.io/dra-resource-management" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the introduction of this new annotation in the PerformanceProfile API (along with its expected behavior) be documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added as an annotation since we don't want this to be part of the official API.
At this point in time we mainly want this for experimental usage.
| v, err := resource.ParseQuantity(value) | ||
| if err != nil { | ||
| return err | ||
| if opts.MixedCPUsEnabled && opts.DRAResourceManagement { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, although I would hope the resource management through DRA would supersede the mixed CPU feature that we have. But for now let's make sure this is documented in this repo or by liaising with the docs team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's add a test to validate that the operator returns an error when both are enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, although I would hope the resource management through DRA would supersede the mixed CPU feature that we have.
Theoretically that's correct, but at this point in time we do not have a DRA plugin that can provide this kind of functionality, and since MixedCPUsEnabled depends on CPUManager behavior (which gets disabled when DRA is ON) they cannot co-exist.
Also, let's add a test to validate that the operator returns an error when both are enabled.
Thanks, i'll add.
| }) | ||
| }) | ||
|
|
||
| It("should disable CPU, Memory, and Topology managers when DRA annotation is set", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also be nice to validate that the original manager settings are restored accurately when the annotation is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3503930 to
c5069a5
Compare
This annotation signals NTO that compute resources will be managed by DRA plugins. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Disable CPU and memory managers via kubeletconfig when DRAResourceManagement enabled. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
common place to check if DRA management enabled. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Wire and inject the new options in the code flow. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The test apply DRA management annotation and checks that NTO disables all managers. After that, it reverts changes in the profile and checks that NTO applies the managers configuration back to their original configuration. Signed-off-by: Talor Itzhak <titzhak@redhat.com>
c5069a5 to
ac51808
Compare
|
@Tal-or: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Enabling core resources to be managed by DRA drivers doesn't just require an unapplied performance profile, IOW the current OCP defaults are not necessarily good enough.
Added an annotation to the PerformanceProfile to actively disable Topology, CPU and memory managers, to make room for DRA drivers to act upon the cluster with minimal conflicts.
Signed-off-by: Talor Itzhak titzhak@redhat.com