-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[release-4.20] OCPBUGS-73785: ensure deterministic zone ordering for control plane machines #10219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.20] OCPBUGS-73785: ensure deterministic zone ordering for control plane machines #10219
Conversation
…achines Control plane machines were intermittently being created in different availability zones than specified in their machine specs. This occurred because the zone list returned from FilterZonesBasedOnInstanceType used a set's UnsortedList() func, which has a non-deterministic order. When CAPI and MAPI manifest generation independently called this func, they could receive zones in different orders, causing a mismatch in machine zone placements between CAPI and MAPI manifests. This commit ensures that we sort the zone slices before further processing.
|
@openshift-cherrypick-robot: Jira Issue OCPBUGS-73773 has been cloned as Jira Issue OCPBUGS-73785. Will retitle bug to link to clone. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-73785, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
tthvo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tthvo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-aws-default-config |
|
/jira refresh |
|
@gpei: This pull request references Jira Issue OCPBUGS-73785, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/label backport-risk-assessed |
|
I'll verify this today. |
|
Verified with the build: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/release-openshift-origin-installer-launch-aws-modern/2011655963467059200 Verification scripts: https://github.com/liweinan/my-openshift-workspace/tree/main/OCPBUGS-69923 anan@think:~/works/openshift-versions/420beta1$ ./openshift-install version
./openshift-install 4.20.0-0-2026-01-15-051152-test-ci-ln-hbwt5p2-latest
built from commit f4546b5cc38eae4d32d3314f2be0e78e4b4e4b00
release image registry.build10.ci.openshift.org/ci-ln-hbwt5p2/release@sha256:671a40f2ac49625dc85b0dfd9671ac11216b06c939bfed6a779bc66c714845a9
release architecture amd64
anan@think:~/works/openshift-versions/420beta1$ ./openshift-install create install-config
? SSH Public Key /home/anan/.ssh/id_rsa.pub
? Platform aws
INFO Credentials loaded from the AWS config using "SharedConfigCredentials: /home/anan/.aws/credentials" provider
INFO Credentials loaded from the "default" profile in file "/home/anan/.aws/credentials"
? Region us-east-1
? Base Domain qe.devcluster.openshift.com
? Cluster Name weli-test
? Pull Secret [? for help] **********************************************************************************************INFO Install-Config created in: .
anan@think:~/works/openshift-versions/420beta1$ ls
install-config.yaml openshift-install
anan@think:~/works/openshift-versions/420beta1$ cp install-config.yaml install-config.yaml.bkup
anan@think:~/works/openshift-versions/420beta1$ ls
install-config.yaml install-config.yaml.bkup openshift-install
anan@think:~/works/openshift-versions/420beta1$ ./openshift-install create manifests
INFO Credentials loaded from the "default" profile in file "/home/anan/.aws/credentials"
INFO Credentials loaded from the AWS config using "SharedConfigCredentials: /home/anan/.aws/credentials" provider
INFO Consuming Install Config from target directory
INFO Successfully populated MCS CA cert information: root-ca 2036-01-13T08:02:41Z 2026-01-15T08:02:41Z
INFO Successfully populated MCS TLS cert information: root-ca 2036-01-13T08:02:41Z 2026-01-15T08:02:41Z
INFO Adding clusters...
INFO Manifests created in: cluster-api, manifests and openshift
anan@think:~/works/openshift-versions/420beta1$ ../../my-openshift-workspace/OCPBUGS-69923/verify-manifests.sh
==========================================
Verify CAPI and MAPI Manifest Zone Consistency
==========================================
Installation directory: .
CAPI Machine Zones:
master-0 (99_openshift-cluster-api_master-machines-0.yaml): us-east-1a
master-1 (99_openshift-cluster-api_master-machines-1.yaml): us-east-1b
master-2 (99_openshift-cluster-api_master-machines-2.yaml): us-east-1c
MAPI Machine Zones:
master-0 (from 99_openshift-machine-api_master-control-plane-machine-set.yaml): us-east-1a
master-1 (from 99_openshift-machine-api_master-control-plane-machine-set.yaml): us-east-1b
master-2 (from 99_openshift-machine-api_master-control-plane-machine-set.yaml): us-east-1c
==========================================
Consistency Check
==========================================
✓ Match: master-0 - Zone: us-east-1a
✓ Match: master-1 - Zone: us-east-1b
✓ Match: master-2 - Zone: us-east-1c
✅ Verification PASSED: All machines have consistent zone allocation!anan@think:~/works/openshift-versions/420beta1$ ../../my-openshift-workspace/OCPBUGS-69923/verify-cluster.sh
==========================================
Verify Machine Zone Consistency in Cluster
==========================================
Kubeconfig: /home/anan/works/openshift-versions/420beta1/auth/kubeconfig
✓ Successfully connected to cluster
Found 3 master machine(s)
==========================================
Check Zone Consistency for Each Machine
==========================================
--- Machine: weli-test-q7mkr-master-0 ---
Zone Label: us-east-1a
ProviderID Zone: us-east-1a
Spec Zone: us-east-1a
Subnet Filter: weli-test-q7mkr-subnet-private-us-east-1a
✅ Zone consistent
--- Machine: weli-test-q7mkr-master-1 ---
Zone Label: us-east-1b
ProviderID Zone: us-east-1b
Spec Zone: us-east-1b
Subnet Filter: weli-test-q7mkr-subnet-private-us-east-1b
✅ Zone consistent
--- Machine: weli-test-q7mkr-master-2 ---
Zone Label: us-east-1c
ProviderID Zone: us-east-1c
Spec Zone: us-east-1c
Subnet Filter: weli-test-q7mkr-subnet-private-us-east-1c
✅ Zone consistent
==========================================
Verification Summary
==========================================
Checked 3 master machine(s)
✅ Verification PASSED: All machines have consistent zones!
Cluster verification: PASS ✓
Fix verification successful:
- Zone label, ProviderID zone, and Spec zone are all consistent
- Machines are created in the correct availability zones |
|
/verified by liweinan |
|
@liweinan: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@tthvo: This pull request references Jira Issue OCPBUGS-73785, which is valid. 7 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/tide refresh |
|
/cherry-pick release-4.19 |
|
@tthvo: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
289d016
into
openshift:release-4.20
|
@openshift-cherrypick-robot: Jira Issue Verification Checks: Jira Issue OCPBUGS-73785 Jira Issue OCPBUGS-73785 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@tthvo: new pull request created: #10230 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Fix included in accepted release 4.20.0-0.nightly-2026-01-17-203204 |
This is an automated cherry-pick of #10214
/assign tthvo