Skip to content

Conversation

@Tal-or
Copy link
Contributor

@Tal-or Tal-or commented Dec 11, 2025

Enabling core resources to be managed by DRA drivers doesn't just require an unapplied performance profile, IOW the current OCP defaults are not necessarily good enough.

Added an annotation to the PerformanceProfile to actively disable Topology, CPU and memory managers, to make room for DRA drivers to act upon the cluster with minimal conflicts.

Signed-off-by: Talor Itzhak titzhak@redhat.com

@Tal-or Tal-or marked this pull request as draft December 11, 2025 09:21
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Tal-or
Once this PR has been reviewed and has the lgtm label, please assign yanirq for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Tal-or Tal-or changed the title Nri plugin deployment DRA: disable Kubelet resources and topology managers Dec 11, 2025
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the general direction LGTM

@Tal-or Tal-or force-pushed the nri_plugin_deployment branch from 23baba8 to 3503930 Compare December 11, 2025 10:40
@Tal-or Tal-or marked this pull request as ready for review December 11, 2025 10:52
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2025
@openshift-ci openshift-ci bot requested a review from ffromani December 11, 2025 10:53
@Tal-or Tal-or changed the title DRA: disable Kubelet resources and topology managers CNF-20404: DRA: disable Kubelet resources and topology managers Dec 11, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 11, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Dec 11, 2025

@Tal-or: This pull request references CNF-20404 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Enabling core resources to be managed by DRA drivers doesn't just require an unapplied performance profile, IOW the current OCP defaults are not necessarily good enough.

Added an annotation to the PerformanceProfile to actively disable Topology, CPU and memory managers, to make room for DRA drivers to act upon the cluster with minimal conflicts.

Signed-off-by: Talor Itzhak titzhak@redhat.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Tal-or
Copy link
Contributor Author

Tal-or commented Dec 11, 2025

/retest

1 similar comment
@Tal-or
Copy link
Contributor Author

Tal-or commented Dec 11, 2025

/retest

Copy link
Contributor

@swatisehgal swatisehgal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall direction of selectively disabling kubelet resource and topology managers when DRA is in control makes sense to me.

// PerformanceProfileDRAResourceManagementAnnotation signal the operator to disable KubeletConfig
// topology managers (CPU Manager, Memory Manager) configurations
// that conflict with the DRA feature, and stop reconciling the PerformanceProfile.
const PerformanceProfileDRAResourceManagementAnnotation = "performance.openshift.io/dra-resource-management"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the introduction of this new annotation in the PerformanceProfile API (along with its expected behavior) be documented somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added as an annotation since we don't want this to be part of the official API.
At this point in time we mainly want this for experimental usage.

v, err := resource.ParseQuantity(value)
if err != nil {
return err
if opts.MixedCPUsEnabled && opts.DRAResourceManagement {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, although I would hope the resource management through DRA would supersede the mixed CPU feature that we have. But for now let's make sure this is documented in this repo or by liaising with the docs team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, let's add a test to validate that the operator returns an error when both are enabled.

Copy link
Contributor Author

@Tal-or Tal-or Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, although I would hope the resource management through DRA would supersede the mixed CPU feature that we have.

Theoretically that's correct, but at this point in time we do not have a DRA plugin that can provide this kind of functionality, and since MixedCPUsEnabled depends on CPUManager behavior (which gets disabled when DRA is ON) they cannot co-exist.

Also, let's add a test to validate that the operator returns an error when both are enabled.

Thanks, i'll add.

})
})

It("should disable CPU, Memory, and Topology managers when DRA annotation is set", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be nice to validate that the original manager settings are restored accurately when the annotation is removed.

Copy link
Contributor Author

@Tal-or Tal-or Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tal-or Tal-or force-pushed the nri_plugin_deployment branch from 3503930 to c5069a5 Compare December 14, 2025 12:44
This annotation signals NTO that compute resources will be managed
by DRA plugins.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Disable CPU and memory managers via kubeletconfig when
DRAResourceManagement enabled.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
common place to check if DRA management enabled.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Wire and inject the new options in the code flow.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
The test apply DRA management annotation and checks that NTO
disables all managers.

After that, it reverts changes in the profile and checks that NTO
applies the managers configuration back to their original configuration.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
@Tal-or Tal-or force-pushed the nri_plugin_deployment branch from c5069a5 to ac51808 Compare December 14, 2025 12:45
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 14, 2025

@Tal-or: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants