Skip to content

Conversation

@JoelSpeed
Copy link
Contributor

This PR introduces a new annotation release.openshift.io/feature-gates that accepts a comma separated list of feature gates which must be enabled to consider a manifest for inclusion within the cluster, optionally preceeded by a - to negate that (ie don't include me if this gate is enabled).

The PR adds a feature gate informer to the core operator which keeps a synchronised list of enabled feature gates. This is then passed through to sync workers, added to the sync worker status, and finally passed into the library-go manifest inclusion/exclusion code for decisions on whether the manifest should/shouldn't be included.

I need to test this and explore what the render and task graph parts are about. I see some todo's in the render already about proper feature-setting so it looks like maybe we are ok to ignore that for now.

Note, claude was used to assist in the development of this, I have reviewed every line carefully so I think this is all good and follows existing CVO patterns, but I'm not a CVO expert so please point out where Claude and I have misunderstood 🙏

Draft for now as this is based on a library-go PR that I need to raise and merge before this can go in.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 17, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Dec 17, 2025

@JoelSpeed: This pull request references OCPSTRAT-2485 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR introduces a new annotation release.openshift.io/feature-gates that accepts a comma separated list of feature gates which must be enabled to consider a manifest for inclusion within the cluster, optionally preceeded by a - to negate that (ie don't include me if this gate is enabled).

The PR adds a feature gate informer to the core operator which keeps a synchronised list of enabled feature gates. This is then passed through to sync workers, added to the sync worker status, and finally passed into the library-go manifest inclusion/exclusion code for decisions on whether the manifest should/shouldn't be included.

I need to test this and explore what the render and task graph parts are about. I see some todo's in the render already about proper feature-setting so it looks like maybe we are ok to ignore that for now.

Note, claude was used to assist in the development of this, I have reviewed every line carefully so I think this is all good and follows existing CVO patterns, but I'm not a CVO expert so please point out where Claude and I have misunderstood 🙏

Draft for now as this is based on a library-go PR that I need to raise and merge before this can go in.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Dec 17, 2025

Walkthrough

Feature-gate support was added: informers/listers wired, enabled gate sets tracked and exposed, and threaded through payload loading, manifest filtering, sync worker state, rendering, and tests.

Changes

Cohort / File(s) Change Summary
Module & Dependency Management
go.mod
Added a replace directive redirecting github.com/openshift/library-go to github.com/JoelSpeed/library-go at a specific version.
CVO core & startup
pkg/cvo/cvo.go, pkg/start/start.go
Wired a FeatureGate informer into New, added featureGateLister field and event handlers, initialized and tracked enabled feature gates, exposed getEnabledFeatureGates/updateEnabledFeatureGates, and threaded enabled gates into payload load and reconciliation.
Sync worker & status
pkg/cvo/sync_worker.go, pkg/cvo/sync_test.go, pkg/cvo/cvo_scenarios_test.go
Added SyncWork.EnabledFeatureGates and SyncWorkerStatus.EnabledFeatureGates; extended Update signatures, equality checks, DeepCopy, change detection, status propagation, and tests/fake recorder to include feature gate sets.
Payload API & rendering
pkg/payload/payload.go, pkg/payload/render.go, pkg/payload/payload_test.go, pkg/payload/task_graph_test.go
LoadUpdate and GetImplicitlyEnabledCapabilities gained enabledFeatureGates sets.Set[string]; render now parses FeatureGate manifests to extract requiredFeatureSet and enabledFeatureGates, and passes gate sets into manifest filtering; tests updated to pass gate sets.
Manifest library & task-graph
lib/manifest/manifest.go, hack/cluster-version-util/task_graph.go
Threaded enabledFeatureGates into GetImplicitlyEnabledCapabilities and updated calls to payload.LoadUpdate to pass an explicit empty sets.Set[string] instead of nil.
Feature-gate utilities & interfaces
pkg/featuregates/featuregates.go
Added DesiredVersion() string to the CvoGateChecker interface and implemented it on CvoGates.
Feature-gate tests & integration
pkg/cvo/cvo_featuregates_test.go, pkg/cvo/featuregate_integration_test.go
Added unit and integration tests covering extraction, event handling, manifest filtering, SyncWork integration, and example scenarios for feature-gate behavior.
Manifests added
install/0000_90_cluster-version-operator_03_configmaps.yaml
Added three ConfigMaps with feature-gate annotations used by manifest filtering tests.
Status & test helpers
pkg/cvo/status_test.go
fakeRiFlags gained a desiredVersion field and DesiredVersion() method; minor import reordering.
Bootstrap pod
bootstrap/bootstrap-pod.yaml
Added an annotation include.release.openshift.io/{{ .ClusterProfile }}: "true" to the Pod metadata.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Review focus areas:
    • pkg/payload/render.go — FeatureGate manifest parsing, unstructured→typed conversion, and changed manifest filtering logic.
    • pkg/cvo/sync_worker.go & pkg/cvo/cvo.go — correct propagation of enabled gates, equality/DeepCopy changes, status reporting, and payload reload triggers.
    • API consistency — confirm updated signatures for LoadUpdate/GetImplicitlyEnabledCapabilities and all call sites (including tests) are consistent.
    • Tests & status structs — ensure SyncWorkerStatus.DeepCopy clones EnabledFeatureGates and tests initialize/compare the new field correctly.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4becfc9 and f723c0d.

📒 Files selected for processing (1)
  • bootstrap/bootstrap-pod.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • bootstrap/bootstrap-pod.yaml
🔇 Additional comments (1)
bootstrap/bootstrap-pod.yaml (1)

8-9: No issues found. The annotation uses the standard Go template rendering pipeline that processes bootstrap manifests. The {{ .ClusterProfile }} variable is correctly substituted during template execution via the renderManifest() function, which receives manifestRenderConfig as template data. The resulting annotation follows the established include.release.openshift.io/{cluster-profile} pattern verified throughout the test suite. Bootstrap pod deployment proceeds through the normal template processing flow with no risk of disruption.


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 17, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: JoelSpeed
Once this PR has been reviewed and has the lgtm label, please assign fao89 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Contributor Author

/test e2e-aws-ovn-techpreview

As a smoke test for generally not breaking existing stuff

@JoelSpeed
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 17, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/payload/render.go (1)

67-95: Empty enabledFeatureGates at render time causes negatively-gated manifests to always be included, creating divergence from runtime behavior

In render.go lines 155–172, Include is called with an empty sets.Set[string]{} for enabledFeatureGates. The integration tests confirm that release.openshift.io/feature-gates supports two semantics:

  • gate: included only if the gate is enabled
  • -gate: included if the gate is NOT enabled

With an empty enabledFeatureGates set passed to Include, this creates a mismatch:

  • Positive gates always exclude manifests (gate never enabled)
  • Negative gates always include manifests (gate never seen as enabled)

At runtime, the real CVO uses actual enabledFeatureGates. For manifests annotated with -gate, the CVO may exclude them if that gate becomes enabled, but bootstrap has already applied them. CVO cannot retroactively remove applied resources.

To align render-time and steady-state behavior, either:

  • Skip all manifests bearing release.openshift.io/feature-gates annotations during render, deferring them to the real CVO, or
  • Compute enabledFeatureGates during render to match what the runtime CVO will use
pkg/cvo/sync_worker.go (1)

799-803: Missing EnabledFeatureGates propagation in calculateNextFrom.

When copying desired state to the work object, EnabledFeatureGates is not propagated. This means the sync loop's work object would retain stale feature gates even after a change is detected.

 	if desired != nil {
 		w.Desired = desired.Desired
 		w.Overrides = desired.Overrides
 		w.Capabilities = desired.Capabilities
+		w.EnabledFeatureGates = desired.EnabledFeatureGates
 	}
🧹 Nitpick comments (7)
pkg/cvo/cvo_scenarios_test.go (1)

117-381: Consider centralizing or relaxing EnabledFeatureGates expectations in scenario tests

All these scenarios now hard-code EnabledFeatureGates: sets.New[string]() in SyncWorkerStatus expectations. As the real CVO starts populating non-empty enabled-gate sets, these broad scenario tests will become fragile and need frequent updates.

You may want to:

  • Either ignore EnabledFeatureGates in generic scenario comparisons (e.g., via a helper that zeroes it before DeepEqual), and
  • Add focused tests that explicitly assert on the gate set where that behavior matters.

This keeps scenario tests stable while still validating gate plumbing where it’s semantically important.

Also applies to: 438-545, 676-791, 851-909, 1053-1199, 1261-1372, 1451-1475, 1713-1737, 2114-2135, 2370-2393, 2451-2481, 2635-2707, 2875-2898, 3193-3390, 3503-3615, 3813-3874

pkg/start/start.go (1)

245-289: Startup hard-fails if FeatureGate informer cache doesn’t sync within 30s

processInitialFeatureGate returns an error on WaitForCacheSync timeout, causing Options.Run and the CVO to exit if the FeatureGate informer can’t sync within 30 seconds (e.g., API server slow/unavailable on boot). Previously, CVO tended to start and let controllers block on unsynced caches.

Consider instead:

  • Logging the timeout,
  • Falling back to a safe default feature set / DefaultCvoGates, and
  • Letting the ChangeStopper controller react once FeatureGates are readable,

so transient API issues don’t prevent the operator from starting at all.

pkg/cvo/cvo_featuregates_test.go (1)

172-185: Consider adding a test with a mock lister.

The comment acknowledges this limitation. The current test only validates the nil-lister fallback path. Consider adding a test with a populated lister to verify the happy path initialization.

Would you like me to help generate a test case with a mock lister to cover the initialization path when the lister is available?

pkg/cvo/cvo.go (2)

1147-1154: Consider using Clone() for defensive copy.

The sets.Set type has a built-in Clone() method that is more idiomatic:

 func (optr *Operator) getEnabledFeatureGates() sets.Set[string] {
-	// Return a copy to prevent external modification
-	result := sets.Set[string]{}
-	for gate := range optr.enabledManifestFeatureGates {
-		result.Insert(gate)
-	}
-	return result
+	// Return a copy to prevent external modification
+	return optr.enabledManifestFeatureGates.Clone()
 }

1101-1114: Delete handler is missing.

The featureGateEventHandler handles Add and Update but not Delete. While the "cluster" FeatureGate object is typically not deleted, consider adding a DeleteFunc for completeness to clear the gates if the object is removed:

 func (optr *Operator) featureGateEventHandler() cache.ResourceEventHandler {
 	workQueueKey := optr.queueKey()
 	return cache.ResourceEventHandlerFuncs{
 		AddFunc: func(obj interface{}) {
 			optr.updateEnabledFeatureGates(obj)
 			optr.queue.Add(workQueueKey)
 		},
 		UpdateFunc: func(old, new interface{}) {
 			optr.updateEnabledFeatureGates(new)
 			optr.queue.Add(workQueueKey)
 		},
+		DeleteFunc: func(obj interface{}) {
+			optr.enabledManifestFeatureGates = sets.Set[string]{}
+			optr.queue.Add(workQueueKey)
+		},
 	}
 }
pkg/cvo/sync_worker.go (2)

140-147: Minor comment inaccuracy.

The comment says "since this is a list" but EnabledFeatureGates is a sets.Set[string], not a list. Consider updating the comment:

-	// Provide a proper deep copy for feature gates since this is a list.
+	// Provide a proper deep copy for feature gates since sets.Set requires explicit cloning.

473-484: Consider cloning enabledFeatureGates during SyncWork construction.

The enabledFeatureGates parameter is assigned directly to work.EnabledFeatureGates without cloning. Since sets.Set is a reference type, external mutations to the caller's set could unexpectedly affect the sync work. For defensive programming, consider cloning:

 	work := &SyncWork{
 		Generation:          generation,
 		Desired:             desired,
 		Overrides:           config.Spec.Overrides,
-		EnabledFeatureGates: enabledFeatureGates,
+		EnabledFeatureGates: enabledFeatureGates.Clone(),
 	}

Alternatively, if the caller guarantees immutability, document this expectation.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d4cb3b0 and 56c59ce.

⛔ Files ignored due to path filters (4)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/library-go/pkg/crypto/crypto.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/manifest/manifest.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (14)
  • go.mod (1 hunks)
  • hack/cluster-version-util/task_graph.go (2 hunks)
  • lib/manifest/manifest.go (3 hunks)
  • pkg/cvo/cvo.go (10 hunks)
  • pkg/cvo/cvo_featuregates_test.go (1 hunks)
  • pkg/cvo/cvo_scenarios_test.go (64 hunks)
  • pkg/cvo/featuregate_integration_test.go (1 hunks)
  • pkg/cvo/sync_test.go (2 hunks)
  • pkg/cvo/sync_worker.go (18 hunks)
  • pkg/payload/payload.go (3 hunks)
  • pkg/payload/payload_test.go (3 hunks)
  • pkg/payload/render.go (1 hunks)
  • pkg/payload/task_graph_test.go (2 hunks)
  • pkg/start/start.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • go.mod
  • pkg/cvo/cvo_featuregates_test.go
  • pkg/cvo/sync_test.go
  • pkg/cvo/sync_worker.go
  • pkg/payload/payload_test.go
  • pkg/payload/task_graph_test.go
  • hack/cluster-version-util/task_graph.go
  • pkg/cvo/cvo.go
  • lib/manifest/manifest.go
  • pkg/start/start.go
  • pkg/cvo/cvo_scenarios_test.go
  • pkg/payload/payload.go
  • pkg/cvo/featuregate_integration_test.go
  • pkg/payload/render.go
🧬 Code graph analysis (7)
pkg/cvo/sync_test.go (1)
pkg/cvo/sync_worker.go (1)
  • SyncWorkerStatus (109-137)
pkg/cvo/sync_worker.go (2)
pkg/payload/payload.go (2)
  • Update (104-117)
  • LoadUpdate (138-240)
lib/capability/capability.go (2)
  • SortedList (28-33)
  • GetCapabilitiesStatus (81-86)
pkg/payload/payload_test.go (2)
pkg/payload/payload.go (3)
  • LoadUpdate (138-240)
  • DefaultClusterProfile (100-100)
  • GetImplicitlyEnabledCapabilities (245-255)
lib/manifest/manifest.go (1)
  • GetImplicitlyEnabledCapabilities (46-96)
pkg/payload/task_graph_test.go (1)
pkg/payload/payload.go (2)
  • LoadUpdate (138-240)
  • DefaultClusterProfile (100-100)
hack/cluster-version-util/task_graph.go (1)
pkg/payload/payload.go (2)
  • LoadUpdate (138-240)
  • DefaultClusterProfile (100-100)
pkg/cvo/cvo_scenarios_test.go (2)
pkg/cvo/cvo.go (1)
  • New (208-310)
lib/resourcebuilder/interface.go (1)
  • New (83-89)
pkg/payload/payload.go (2)
lib/capability/capability.go (1)
  • ClusterCapabilities (12-16)
lib/manifest/manifest.go (2)
  • GetImplicitlyEnabledCapabilities (46-96)
  • InclusionConfiguration (18-36)
🔇 Additional comments (31)
pkg/payload/task_graph_test.go (1)

16-20: LoadUpdate signature update in tests looks correct

Using sets.Set[string]{} as the enabledFeatureGates argument cleanly adapts the test to the new LoadUpdate signature without changing behavior.

Also applies to: 486-495

pkg/start/start.go (1)

553-573: FeatureGate informer and ChangeStopper wiring into CVO looks consistent

Passing configInformerFactory.Config().V1().FeatureGates() into cvo.New, storing StopOnFeatureGateChange in the Context, and starting it alongside the main operator under leader election aligns cleanly with the new feature-gate model. The lifetime is correctly tied to runContext/runCancel, and informer factories are shared appropriately.

Also applies to: 577-624, 336-389

hack/cluster-version-util/task_graph.go (1)

3-12: Task graph CLI adaptation to LoadUpdate signature is fine

The additional sets.Set[string]{} argument to payload.LoadUpdate correctly reflects the new enabledFeatureGates parameter, and importing sets here is appropriate for that use.

Also applies to: 32-35

pkg/cvo/sync_test.go (1)

18-25: fakeSyncRecorder updated to match new Update signature

Adding the enabledFeatureGates sets.Set[string] parameter to fakeSyncRecorder.Update keeps the test double aligned with the production recorder interface while preserving existing behavior (only tracks desired). No further changes needed here.

Also applies to: 474-500

pkg/payload/payload_test.go (1)

35-143: Tests correctly adapted to enabledFeatureGates parameters

Passing sets.Set[string]{} into LoadUpdate and GetImplicitlyEnabledCapabilities matches the updated APIs and keeps these tests exercising the same behavior (no feature gates in play). This is a straightforward and appropriate adaptation.

Also applies to: 145-218, 352-374

lib/manifest/manifest.go (1)

46-52: LGTM - Clean parameter addition.

The enabledFeatureGates parameter is correctly added to the function signature and consistently propagated to both IncludeAllowUnknownCapabilities calls.

pkg/cvo/cvo_featuregates_test.go (3)

10-88: LGTM - Good test coverage for extractEnabledGates.

Tests cover the key scenarios: matching version extraction, non-matching version handling, and empty enabled gates. The table-driven approach is clean.


90-109: Good verification of copy semantics.

The test correctly validates that getEnabledFeatureGates() returns a defensive copy, preventing external mutations from affecting internal state.


111-170: LGTM - Covers type safety and nil handling.

Tests validate behavior for valid FeatureGate objects, invalid types, and nil objects appropriately.

pkg/cvo/cvo.go (1)

284-302: LGTM - Feature gate informer wiring.

The event handler registration, lister assignment, cache sync setup, and initialization call are properly integrated following the existing patterns for other informers.

pkg/cvo/featuregate_integration_test.go (3)

120-160: LGTM - Good integration test for SyncWork pipeline.

Validates that EnabledFeatureGates flows correctly through SyncWork and affects manifest filtering decisions.


203-277: LGTM - Useful real-world examples.

The descriptive test cases serve as documentation for common usage patterns (TechPreview gating, alternative implementations, legacy fallbacks).


12-118: LGTM - Comprehensive feature gate filtering tests with well-documented scenarios.

Tests cover positive gates, negative (exclusion) gates, AND logic combinations, and no-annotation defaults. The expected error messages document the library-go contract clearly, though these assertions depend on an external dependency (JoelSpeed/library-go) that cannot be verified from this repository. The production code gracefully handles these errors by logging them rather than asserting on exact message format, mitigating brittleness risk.

pkg/payload/payload.go (2)

138-139: LGTM - API signature update.

The enabledFeatureGates parameter is correctly added to LoadUpdate and threaded through to the manifest filtering logic.


242-255: LGTM - Consistent parameter propagation.

GetImplicitlyEnabledCapabilities correctly accepts and forwards enabledFeatureGates to the underlying localmanifest.GetImplicitlyEnabledCapabilities call.

pkg/cvo/sync_worker.go (16)

34-43: Interface signature change looks correct.

The addition of enabledFeatureGates sets.Set[string] to the Update method signature is consistent with the feature gate propagation design. The parameter type is appropriate for a set of feature gate names.


81-85: Field addition is well-documented.

The EnabledFeatureGates field is appropriately typed and documented. The placement within SyncWork is logical alongside other configuration fields like Capabilities.


133-137: LGTM!

The EnabledFeatureGates field in SyncWorkerStatus appropriately mirrors the field in SyncWork for status visibility.


355-356: LGTM!

The EnabledFeatureGates is correctly propagated to payload.LoadUpdate for manifest filtering during payload loading.


429-432: LGTM!

The EnabledFeatureGates is correctly propagated to payload.GetImplicitlyEnabledCapabilities to ensure feature gate state is considered when computing implicitly enabled capabilities.


507-513: LGTM!

The equality check correctly considers feature gates alongside version, overrides, and capabilities. The logic properly treats work as unchanged only when all four aspects are equal.


537-538: LGTM!

Feature gates are correctly initialized in status using Clone() to ensure independent copies.


555-561: LGTM!

The error path correctly preserves override, capability, and feature gate changes even when payload loading fails. The use of Clone() for status ensures independent copies.


578-580: LGTM!

Status is correctly updated with feature gates after successful payload load.


785-786: LGTM!

The work calculation correctly considers feature gate changes alongside other change types.


841-876: LGTM!

The equalSyncWork function is correctly extended to handle feature gate comparison. The use of sets.Set.Equal is appropriate, and the logging of changes aids debugging.


1047-1050: LGTM!

Feature gates are correctly propagated to Manifest.Include during precreation to filter ClusterOperator manifests.


1067-1070: LGTM!

Feature gates are correctly propagated to Manifest.Include during main manifest processing.


1144-1148: LGTM!

The switch to DeepCopy() is necessary to properly handle the EnabledFeatureGates set in status reporting.


1158-1165: LGTM!

Consistent use of DeepCopy() in the error reporting path.


1182-1189: LGTM!

Consistent use of DeepCopy() in the completion reporting path.

k8s.io/utils v0.0.0-20250604170112-4c0f3b243397
)

replace github.com/openshift/library-go => github.com/JoelSpeed/library-go v0.0.0-20251217102247-bcd739ac7ffa
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the full go.mod file to understand context
cat go.mod

Repository: openshift/cluster-version-operator

Length of output: 4437


🏁 Script executed:

# Look for any git history or PR context about this change
git log --oneline -n 20 -- go.mod

Repository: openshift/cluster-version-operator

Length of output: 142


🏁 Script executed:

# Search for any documentation or comments about library-go dependency
rg -i "library-go" --type md --type txt -A 2 -B 2

Repository: openshift/cluster-version-operator

Length of output: 60


Replace to personal library-go fork must be reverted before merge

The replace github.com/openshift/library-go => github.com/JoelSpeed/library-go v0.0.0-20251217102247-bcd739ac7ffa directive must not land on main. Personal fork replacements create maintenance and security risks. Before merging, revert to the upstream github.com/openshift/library-go version that includes the necessary feature-gate changes (once the upstream PR is merged).

🤖 Prompt for AI Agents
In go.mod at line 36, the temporary replace directive pointing to the personal
fork (replace github.com/openshift/library-go => github.com/JoelSpeed/library-go
v0.0.0-20251217102247-bcd739ac7ffa) must be removed and the module restored to
the upstream github.com/openshift/library-go version; revert that replace line,
update the require to the appropriate upstream version (or leave the original
version constraint) once the upstream PR is merged, then run go mod tidy (and
regenerate go.sum) to ensure dependencies are consistent before merging.

@JoelSpeed JoelSpeed force-pushed the feature-gate-inclusion branch from 53b2f28 to ed06ae4 Compare December 18, 2025 10:25
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This file is not to be merged, this is just for testing purposes

func runTaskGraphCmd(cmd *cobra.Command, args []string) error {
manifestDir := args[0]
release, err := payload.LoadUpdate(manifestDir, "", "", "", payload.DefaultClusterProfile, nil)
release, err := payload.LoadUpdate(manifestDir, "", "", "", payload.DefaultClusterProfile, nil, sets.Set[string]{})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle feature sets today, does it need to handle feature gates?

name: "include manifest with matching feature gate",
enabledGates: sets.New[string]("TechPreviewFeatureGate"),
manifestAnnotations: map[string]string{
"release.openshift.io/feature-gates": "TechPreviewFeatureGate",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look at this, the more I'm thinking we should remove the s, s/feature-gates/feature-gate to match what we have for feature-sets

@DavidHurta
Copy link
Contributor

/cc

@openshift-ci openshift-ci bot requested a review from DavidHurta December 19, 2025 12:37
Comment on lines -129 to -135
if strings.Contains(file.Name(), "CustomNoUpgrade") || strings.Contains(file.Name(), "TechPreviewNoUpgrade") || strings.Contains(file.Name(), "DevPreviewNoUpgrade") {
// CustomNoUpgrade, TechPreviewNoUpgrade and DevPreviewNoUpgrade may add features to manifests like the ClusterVersion CRD,
// but we do not need those features during bootstrap-render time. In those clusters, the production
// CVO will be along shortly to update the manifests and deliver the gated features.
// fixme: now that we have requiredFeatureSet, use it to do Manifest.Include() filtering here instead of making filename assumptions
continue
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now all be handled in the manifest.Include, since I've adapted so that every manifest rendered runs through manifest.Include now

}
filteredManifests := make([]string, 0, len(manifests))
for _, manifest := range manifests {
if len(filterGroupKind) > 0 && !filterGroupKind.Has(manifest.GVK.GroupKind()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change from before here is that previously we only did any filtering when there was a filter on groups. Now we run manifest.Include on every manifest, and the group/kind exclusion only when there is a specified group/kind filter

Comment on lines +209 to +212
var featureGate configv1.FeatureGate
if err := runtime.DefaultUnstructuredConverter.FromUnstructured(featureGateManifest.Obj.Object, &featureGate); err != nil {
return nil, nil, fmt.Errorf("failed to convert FeatureGate manifest %s to structured object: %w", featureGateManifest.OriginalFilename, err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different to before, switching to a structured resource to allow fetching fields within to be easier. We already had a dep on configv1 so this didn't seem to make much sense not to use a structured object

@JoelSpeed JoelSpeed marked this pull request as ready for review December 19, 2025 13:32
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 19, 2025
@JoelSpeed
Copy link
Contributor Author

Have been testing this on a cluster and locally applying the rendering and I'm confident this is now doing as it's supposed to. I think this is ready for review @DavidHurta

I have yet to look at how this affects HyperShift, that's next

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
pkg/cvo/cvo.go (1)

1130-1154: Critical race condition in equality check.

The equality check at line 1141 reads enabledManifestFeatureGates without holding the lock, but the lock is acquired only at line 1142. This creates a TOCTOU (time-of-check-time-of-use) race where concurrent calls to updateEnabledFeatureGates or getEnabledFeatureGates can interleave, causing stale reads and incorrect change detection.

Acquire the lock before reading enabledManifestFeatureGates:

🔎 Proposed fix
 func (optr *Operator) updateEnabledFeatureGates(obj interface{}) bool {
 	featureGate, ok := obj.(*configv1.FeatureGate)
 	if !ok {
 		klog.Warningf("Expected FeatureGate object but got %T", obj)
 		return false
 	}
 
 	newGates := optr.extractEnabledGates(featureGate)
 
+	optr.featureGatesMutex.Lock()
+	defer optr.featureGatesMutex.Unlock()
+
 	// Check if gates actually changed to avoid unnecessary work
 	if !optr.enabledManifestFeatureGates.Equal(newGates) {
-		optr.featureGatesMutex.Lock()
-		defer optr.featureGatesMutex.Unlock()
-
 		klog.V(2).Infof("Cluster feature gates changed from %v to %v",
 			sets.List(optr.enabledManifestFeatureGates), sets.List(newGates))
 
 		optr.enabledManifestFeatureGates = newGates
 
 		return true
 	}
 
 	return false
 }
🧹 Nitpick comments (5)
install/0000_90_cluster-version-operator_03_configmaps.yaml (1)

1-32: Consider adding a test case for the missing feature-gate combination.

The test fixture covers three of the four possible combinations of two feature gates:

  • Both enabled (dev-preview-created)
  • First enabled, second disabled (tech-preview-created)
  • Both disabled (default-created)

For completeness, consider adding a fourth ConfigMap testing the remaining combination: -Example,Example2 (first disabled, second enabled).

pkg/cvo/featuregate_integration_test.go (2)

13-118: Consider loosening assertions on exact error strings from manifest filtering

Several table rows compare err.Error() to a hard‑coded full string. That tightly couples these tests to the exact wording from the filtering layer, so even minor phrasing tweaks will break them without changing behavior. Using substring/strings.Contains checks or error types would keep the tests focused on behavior (include vs. exclude and which gate failed) while being more resilient to implementation detail changes.

Also applies to: 207-280


120-160: SyncWorkIntegration test doesn’t exercise the real SyncWorker plumbing

TestSyncWorkIntegration currently constructs a SyncWork by hand and then calls manifest filtering directly with work.EnabledFeatureGates, without driving the actual worker to emit a SyncWorkerStatus. That still validates the annotation semantics, but it doesn’t prove that real SyncWork instances produced by the worker carry the correct gate set. If you want coverage of the wiring, consider a follow‑up that runs the worker and inspects SyncWorkerStatus.EnabledFeatureGates instead.

pkg/payload/render.go (1)

188-239: parseFeatureGateManifest’s strict shape assumptions may be brittle over time

parseFeatureGateManifest currently (a) errors if the file contains anything other than a single manifest and (b) errors if status.featureGates has more than one entry. That matches today’s expectations for the CVO’s feature‑gate manifest, but it also means Render will fail hard if the manifest ever grows multiple version blocks or extra docs. If you foresee richer manifests, consider selecting the entry that matches the CVO’s desired version (or documenting that only a single entry is supported) instead of treating “>1” as fatal.

pkg/cvo/cvo.go (1)

258-260: Optional refactor: queue initialization modernization.

The queue initialization has been refactored to use explicit configuration. While this change is unrelated to the PR's main feature-gate objective, it appears to be a code modernization.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 56c59ce and 4becfc9.

⛔ Files ignored due to path filters (3)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/library-go/pkg/manifest/manifest.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (17)
  • go.mod (1 hunks)
  • hack/cluster-version-util/task_graph.go (2 hunks)
  • install/0000_90_cluster-version-operator_03_configmaps.yaml (1 hunks)
  • lib/manifest/manifest.go (3 hunks)
  • pkg/cvo/cvo.go (10 hunks)
  • pkg/cvo/cvo_featuregates_test.go (1 hunks)
  • pkg/cvo/cvo_scenarios_test.go (64 hunks)
  • pkg/cvo/featuregate_integration_test.go (1 hunks)
  • pkg/cvo/status_test.go (2 hunks)
  • pkg/cvo/sync_test.go (2 hunks)
  • pkg/cvo/sync_worker.go (20 hunks)
  • pkg/featuregates/featuregates.go (2 hunks)
  • pkg/payload/payload.go (3 hunks)
  • pkg/payload/payload_test.go (3 hunks)
  • pkg/payload/render.go (6 hunks)
  • pkg/payload/task_graph_test.go (2 hunks)
  • pkg/start/start.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • pkg/cvo/cvo_featuregates_test.go
  • lib/manifest/manifest.go
  • go.mod
  • pkg/payload/payload.go
  • pkg/start/start.go
  • pkg/payload/task_graph_test.go
  • pkg/cvo/sync_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • pkg/featuregates/featuregates.go
  • pkg/cvo/cvo_scenarios_test.go
  • pkg/cvo/status_test.go
  • pkg/payload/payload_test.go
  • install/0000_90_cluster-version-operator_03_configmaps.yaml
  • pkg/cvo/sync_worker.go
  • pkg/cvo/featuregate_integration_test.go
  • hack/cluster-version-util/task_graph.go
  • pkg/cvo/cvo.go
  • pkg/payload/render.go
🧬 Code graph analysis (3)
pkg/cvo/sync_worker.go (1)
pkg/payload/payload.go (2)
  • Update (104-117)
  • LoadUpdate (138-240)
pkg/cvo/featuregate_integration_test.go (2)
pkg/cvo/cvo.go (1)
  • Operator (83-207)
pkg/cvo/sync_worker.go (1)
  • SyncWork (68-85)
hack/cluster-version-util/task_graph.go (1)
pkg/payload/payload.go (2)
  • LoadUpdate (138-240)
  • DefaultClusterProfile (100-100)
🔇 Additional comments (13)
hack/cluster-version-util/task_graph.go (2)

9-9: LGTM: Import correctly added for feature gate support.

The sets package import is necessary for the updated LoadUpdate signature.


34-34: Empty feature gate set is acceptable for this utility, but consider configurability if needed.

The empty set sets.Set[string]{} aligns with this tool's pattern of using default values (empty strings for other parameters). Since this is a development utility for visualizing task graphs, the empty set is reasonable.

However, the past review comment raises a valid point: if feature gate filtering affects task graph structure or manifest inclusion during debugging, you may want to add a CLI flag to specify enabled feature gates, similar to how --parallel is configurable.

For now, the empty set is acceptable unless feature gate visualization is a requirement.

pkg/cvo/status_test.go (1)

203-224: fakeRiFlags correctly updated to satisfy the new gate-checker API

Adding desiredVersion and DesiredVersion() keeps fakeRiFlags compatible with the extended gate-checker interface while preserving existing behavior for tests that don’t care about version. Looks good.

pkg/cvo/cvo_scenarios_test.go (1)

21-38: Explicit EnabledFeatureGates in setup and status snapshots look correct

Initializing Operator.enabledFeatureGates with DefaultCvoGates("version") and populating SyncWorkerStatus.EnabledFeatureGates with sets.New[string]() in the expectations keeps the status surface in sync with the new field while avoiding nil/non‑nil map mismatches in equality checks. No functional or maintainability issues from these changes.

Also applies to: 117-149, 249-381, 436-547, 676-708, 854-908, 1089-1171, 1415-1593, 1713-1775, 1817-1855, 2114-2134, 2451-2479, 2634-2707, 3192-3295, 3311-3416, 3503-3530, 3554-3578, 3635-3667, 3815-3874

pkg/payload/payload_test.go (1)

129-143: Tests correctly adapted to the new enabledFeatureGates parameter

Supplying an empty sets.Set[string]{} to LoadUpdate and GetImplicitlyEnabledCapabilities keeps these tests’ behavior unchanged while matching the expanded function signatures. This is the right way to thread the new parameter in where no feature gates are under test.

Also applies to: 205-218, 352-372

pkg/payload/render.go (1)

38-83: New renderDir filtering path is coherent and preserves behavior

Threading requiredFeatureSet and enabledFeatureGates into renderDir and doing per‑manifest filtering (with group/kind checks first, then gate/profile checks) keeps the existing traversal and error‑aggregation semantics while making the gating behavior explicit and well‑logged. Re‑emitting only accepted manifest docs via filteredManifests is straightforward and doesn’t introduce obvious correctness or performance issues.

Also applies to: 85-166

pkg/featuregates/featuregates.go (1)

18-42: CvoGateChecker/CvoGates wiring for DesiredVersion is internally consistent

The interface extension and CvoGates.DesiredVersion() implementation are coherent; DefaultCvoGates correctly initializes desiredVersion, and all internal usages remain consistent. CvoGateChecker is exported, which technically constitutes a breaking change for external implementers requiring the new method. Repository-wide search found no out-of-tree users depending on the old interface shape.

pkg/cvo/cvo.go (2)

194-196: LGTM!

The mutex and field declaration are correct. The RWMutex allows concurrent reads while protecting writes.


1169-1192: No thread-safety issue exists.

The DesiredVersion() method only reads a string field from the CvoGates struct using a value receiver. Reading primitive types like strings is atomic in Go. Additionally, the enabledFeatureGates field is assigned once during operator initialization and never modified thereafter, making concurrent reads of DesiredVersion() completely safe.

Likely an incorrect or invalid review comment.

pkg/cvo/sync_worker.go (4)

82-84: LGTM!

The EnabledFeatureGates field addition to SyncWork is well-documented and correctly typed.


141-146: LGTM!

The DeepCopy implementation correctly uses Clone() to deep-copy the EnabledFeatureGates set, preventing external modification.


652-667: LGTM! Correct handling of feature gate changes.

When feature gates change, the payload must be reloaded because manifest filtering criteria have changed. Clearing w.payload and calling loadUpdatedPayload ensures that manifests are re-evaluated with the new set of enabled feature gates. The approach correctly handles both inclusion of newly matching manifests and exclusion of manifests that no longer match.


863-895: LGTM!

The equalSyncWork function correctly extends equality checking to include feature gates comparison using Equal(), and properly returns all four equality flags. The logging of detected changes aids in debugging.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 19, 2025

@JoelSpeed: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-techpreview f723c0d link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-hypershift f723c0d link true /test e2e-hypershift
ci/prow/e2e-hypershift-conformance f723c0d link true /test e2e-hypershift-conformance

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants