-
Notifications
You must be signed in to change notification settings - Fork 2.1k
OCPBUGS-69923: Add static zone consistency validation test for AWS #73935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
liweinan
wants to merge
4
commits into
openshift:master
Choose a base branch
from
liweinan:OCPBUGS-69923-remake
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+371
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 10 additions & 0 deletions
10
ci-operator/step-registry/cucushift/installer/rehearse/aws/cases/zone-consistency/OWNERS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| approvers: | ||
| - jianlinliu | ||
| - gpei | ||
| - yunjiang29 | ||
| - liweinan | ||
| reviewers: | ||
| - jianlinliu | ||
| - gpei | ||
| - yunjiang29 | ||
| - liweinan |
174 changes: 174 additions & 0 deletions
174
...ases/zone-consistency/cucushift-installer-rehearse-aws-cases-zone-consistency-commands.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,174 @@ | ||
| #!/bin/bash | ||
|
|
||
| # OCPBUGS-69923 - Verify control plane machine zone allocation consistency in manifests | ||
| # Run 10 iterations to verify CAPI and MAPI zone allocation is deterministic | ||
| # | ||
| # IMPORTANT: Must compare CORRECT files: | ||
| # - CAPI zones: cluster-api/machines/10_inframachine_*-master-*.yaml (from subnet filter) | ||
| # - MAPI zones: openshift/99_openshift-machine-api_master-control-plane-machine-set.yaml (failureDomains) | ||
| # | ||
| # NOTE: openshift/99_openshift-cluster-api_master-machines-*.yaml is MAPI (despite the name)! | ||
|
|
||
| set -o errexit | ||
| set -o pipefail | ||
| set -o nounset | ||
|
|
||
| export AWS_SHARED_CREDENTIALS_FILE="${CLUSTER_PROFILE_DIR}/.awscred" | ||
|
|
||
| REGION="${LEASED_RESOURCE}" | ||
| CLUSTER_NAME="${NAMESPACE}-${UNIQUE_HASH}" | ||
|
|
||
| SSH_PUB_KEY=$(<"${CLUSTER_PROFILE_DIR}/ssh-publickey") | ||
| PULL_SECRET=$(<"${CLUSTER_PROFILE_DIR}/pull-secret") | ||
|
|
||
| WORK_DIR="/tmp/test-zone-consistency" | ||
|
|
||
| echo "openshift-install version:" | ||
| openshift-install version | ||
| echo "" | ||
|
|
||
| TOTAL_FAILURES=0 | ||
|
|
||
| for iteration in $(seq 1 10); do | ||
| echo "==========================================" | ||
| echo "Iteration $iteration/10" | ||
| echo "==========================================" | ||
|
|
||
| # Clean up everything from previous iteration | ||
| rm -rf "${WORK_DIR}" | ||
| mkdir -p "${WORK_DIR}" | ||
|
|
||
| # Create install-config.yaml (without specifying zones - triggers the bug path) | ||
| cat > "${WORK_DIR}/install-config.yaml" << EOF | ||
| apiVersion: v1 | ||
| baseDomain: ${BASE_DOMAIN} | ||
| metadata: | ||
| name: ${CLUSTER_NAME} | ||
| controlPlane: | ||
| architecture: amd64 | ||
| hyperthreading: Enabled | ||
| name: master | ||
| replicas: 3 | ||
| compute: | ||
| - architecture: amd64 | ||
| hyperthreading: Enabled | ||
| name: worker | ||
| replicas: 3 | ||
| platform: | ||
| aws: | ||
| region: ${REGION} | ||
| pullSecret: > | ||
| ${PULL_SECRET} | ||
| sshKey: | | ||
| ${SSH_PUB_KEY} | ||
| EOF | ||
|
|
||
| # Backup install-config.yaml before it gets consumed by create manifests | ||
| cp "${WORK_DIR}/install-config.yaml" "${WORK_DIR}/install-config.yaml.backup" | ||
|
|
||
| # Generate manifests (this consumes install-config.yaml) | ||
| openshift-install create manifests --dir "${WORK_DIR}" | ||
|
|
||
| # Extract CAPI zones from cluster-api/machines/10_inframachine_*-master-*.yaml | ||
| # These are the REAL CAPI AWSMachine objects (not the misleadingly named openshift/99_openshift-cluster-api_* files) | ||
| capi_zones="" | ||
| for file in $(find "$WORK_DIR"/cluster-api/machines -name "10_inframachine_*-master-*.yaml" -type f 2>/dev/null | sort); do | ||
| # Extract zone from subnet filter name (e.g., "cluster-subnet-private-us-east-1a" -> "us-east-1a") | ||
| subnet_name=$(yq-go r "$file" 'spec.subnet.filters[0].values[0]' 2>/dev/null || echo "") | ||
| # Extract zone from subnet name using region pattern | ||
| zone=$(echo "$subnet_name" | grep -oE "${REGION}[a-z]$" || echo "") | ||
| if [ -n "$zone" ] && [ "$zone" != "null" ]; then | ||
| capi_zones="${capi_zones} ${zone}" | ||
| fi | ||
| done | ||
| capi_zones=$(echo "$capi_zones" | xargs) | ||
| capi_count=$(echo "$capi_zones" | wc -w | xargs) | ||
|
|
||
| # Extract MAPI zones from ControlPlaneMachineSet failureDomains | ||
| # File: openshift/99_openshift-machine-api_master-control-plane-machine-set.yaml | ||
| mapi_zones="" | ||
| mapi_count=0 | ||
| cpms_file="$WORK_DIR/openshift/99_openshift-machine-api_master-control-plane-machine-set.yaml" | ||
| if [ -f "$cpms_file" ]; then | ||
| idx=0 | ||
| while [ $mapi_count -lt "$capi_count" ]; do | ||
| zone=$(yq-go r "$cpms_file" "spec.template.machines_v1beta1_machine_openshift_io.failureDomains.aws[$idx].placement.availabilityZone" 2>/dev/null || echo "") | ||
| if [ -z "$zone" ] || [ "$zone" = "null" ]; then | ||
| break | ||
| fi | ||
| mapi_zones="${mapi_zones} ${zone}" | ||
| mapi_count=$((mapi_count + 1)) | ||
| idx=$((idx + 1)) | ||
| done | ||
| else | ||
| echo " ERROR: ControlPlaneMachineSet file not found!" | ||
| fi | ||
| mapi_zones=$(echo "$mapi_zones" | xargs) | ||
|
|
||
| # Save manifests to ARTIFACT_DIR for verification (regardless of test result) | ||
| iteration_artifact_dir="${ARTIFACT_DIR}/iteration-${iteration}" | ||
| mkdir -p "${iteration_artifact_dir}" | ||
|
|
||
| echo " Saving manifests to ${iteration_artifact_dir}..." | ||
|
|
||
| # Copy CAPI machine manifests | ||
| if [ -d "$WORK_DIR/cluster-api/machines" ]; then | ||
| cp -r "$WORK_DIR/cluster-api/machines" "${iteration_artifact_dir}/capi-machines" | ||
| fi | ||
|
|
||
| # Copy ControlPlaneMachineSet | ||
| if [ -f "$cpms_file" ]; then | ||
| cp "$cpms_file" "${iteration_artifact_dir}/control-plane-machine-set.yaml" | ||
| fi | ||
|
|
||
| # Copy install-config for reference (from backup since create manifests consumed the original) | ||
| if [ -f "$WORK_DIR/install-config.yaml.backup" ]; then | ||
| cp "$WORK_DIR/install-config.yaml.backup" "${iteration_artifact_dir}/install-config.yaml" | ||
| fi | ||
|
|
||
| # Compare | ||
| echo " CAPI zones (from cluster-api/machines/10_inframachine_*): $capi_zones" | ||
| echo " MAPI zones (from ControlPlaneMachineSet failureDomains): $mapi_zones" | ||
|
|
||
| if [ "$capi_zones" = "$mapi_zones" ]; then | ||
| echo " PASS" | ||
| else | ||
| echo " FAIL: zones mismatch - CAPI and MAPI have different zone assignments" | ||
|
|
||
| # Print detailed zone differences | ||
| echo " ERROR DETAILS:" | ||
| IFS=' ' read -ra capi_array <<< "$capi_zones" | ||
| IFS=' ' read -ra mapi_array <<< "$mapi_zones" | ||
|
|
||
| for i in "${!capi_array[@]}"; do | ||
| capi_zone="${capi_array[$i]:-}" | ||
| mapi_zone="${mapi_array[$i]:-}" | ||
| if [ "$capi_zone" != "$mapi_zone" ]; then | ||
| echo " Position $((i+1)): CAPI has '$capi_zone' but MAPI has '$mapi_zone'" | ||
| fi | ||
| done | ||
|
|
||
| # Handle case where arrays have different lengths | ||
| if [ ${#capi_array[@]} -ne ${#mapi_array[@]} ]; then | ||
| echo " Zone count mismatch: CAPI has ${#capi_array[@]} zones, MAPI has ${#mapi_array[@]} zones" | ||
| fi | ||
|
|
||
| TOTAL_FAILURES=$((TOTAL_FAILURES + 1)) | ||
| fi | ||
|
|
||
| # Delete all generated files for next iteration | ||
| rm -rf "${WORK_DIR}" | ||
| done | ||
|
|
||
| echo "" | ||
| echo "==========================================" | ||
| echo "Final Result: 10 iterations completed" | ||
| echo "All manifests saved to ${ARTIFACT_DIR}/ for verification" | ||
| if [ $TOTAL_FAILURES -eq 0 ]; then | ||
| echo "PASS: All iterations have consistent zone allocation between CAPI and MAPI" | ||
| else | ||
| echo "FAIL: $TOTAL_FAILURES iterations had zone mismatches (OCPBUGS-69923)" | ||
| fi | ||
| echo "==========================================" | ||
|
|
||
| exit $TOTAL_FAILURES | ||
17 changes: 17 additions & 0 deletions
17
...one-consistency/cucushift-installer-rehearse-aws-cases-zone-consistency-ref.metadata.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| { | ||
| "path": "cucushift/installer/rehearse/aws/cases/zone-consistency/cucushift-installer-rehearse-aws-cases-zone-consistency-ref.yaml", | ||
| "owners": { | ||
| "approvers": [ | ||
| "jianlinliu", | ||
| "gpei", | ||
| "yunjiang29", | ||
| "liweinan" | ||
| ], | ||
| "reviewers": [ | ||
| "jianlinliu", | ||
| "gpei", | ||
| "yunjiang29", | ||
| "liweinan" | ||
| ] | ||
| } | ||
| } |
17 changes: 17 additions & 0 deletions
17
...s/cases/zone-consistency/cucushift-installer-rehearse-aws-cases-zone-consistency-ref.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| ref: | ||
| as: cucushift-installer-rehearse-aws-cases-zone-consistency | ||
| from: upi-installer | ||
| grace_period: 10m | ||
| commands: cucushift-installer-rehearse-aws-cases-zone-consistency-commands.sh | ||
| resources: | ||
| requests: | ||
| cpu: 10m | ||
| memory: 100Mi | ||
| env: | ||
| - name: BASE_DOMAIN | ||
| default: "qe.devcluster.openshift.com" | ||
| documentation: >- | ||
| Verify control plane machine zone allocation consistency in manifests (OCPBUGS-69923). | ||
| This step generates manifests using openshift-install and verifies that CAPI and MAPI | ||
| manifests have consistent zone allocation for control plane machines. | ||
| This is a static validation test that does not require actual cluster installation. |
17 changes: 17 additions & 0 deletions
17
...onsistency/cucushift-installer-rehearse-aws-cases-zone-consistency-workflow.metadata.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| { | ||
| "path": "cucushift/installer/rehearse/aws/cases/zone-consistency/cucushift-installer-rehearse-aws-cases-zone-consistency-workflow.yaml", | ||
| "owners": { | ||
| "approvers": [ | ||
| "jianlinliu", | ||
| "gpei", | ||
| "yunjiang29", | ||
| "liweinan" | ||
| ], | ||
| "reviewers": [ | ||
| "jianlinliu", | ||
| "gpei", | ||
| "yunjiang29", | ||
| "liweinan" | ||
| ] | ||
| } | ||
| } |
11 changes: 11 additions & 0 deletions
11
...es/zone-consistency/cucushift-installer-rehearse-aws-cases-zone-consistency-workflow.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| workflow: | ||
| as: cucushift-installer-rehearse-aws-cases-zone-consistency | ||
| steps: | ||
| pre: | ||
| - chain: cucushift-installer-rehearse-aws-cases-zone-consistency-provision | ||
| - ref: cucushift-installer-reportportal-marker | ||
| documentation: |- | ||
| This workflow runs static validation tests for OCPBUGS-69923: Control plane machine | ||
| zone allocation consistency. It verifies that CAPI and MAPI manifests have consistent | ||
| zone allocation for control plane machines. No cluster provisioning or deprovisioning | ||
| is needed as these are configuration validation tests only. |
10 changes: 10 additions & 0 deletions
10
...or/step-registry/cucushift/installer/rehearse/aws/cases/zone-consistency/provision/OWNERS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| approvers: | ||
| - jianlinliu | ||
| - gpei | ||
| - yunjiang29 | ||
| - liweinan | ||
| reviewers: | ||
| - jianlinliu | ||
| - gpei | ||
| - yunjiang29 | ||
| - liweinan |
17 changes: 17 additions & 0 deletions
17
...ion/cucushift-installer-rehearse-aws-cases-zone-consistency-provision-chain.metadata.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| { | ||
| "path": "cucushift/installer/rehearse/aws/cases/zone-consistency/provision/cucushift-installer-rehearse-aws-cases-zone-consistency-provision-chain.yaml", | ||
| "owners": { | ||
| "approvers": [ | ||
| "jianlinliu", | ||
| "gpei", | ||
| "yunjiang29", | ||
| "liweinan" | ||
| ], | ||
| "reviewers": [ | ||
| "jianlinliu", | ||
| "gpei", | ||
| "yunjiang29", | ||
| "liweinan" | ||
| ] | ||
| } | ||
| } |
9 changes: 9 additions & 0 deletions
9
...cy/provision/cucushift-installer-rehearse-aws-cases-zone-consistency-provision-chain.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| chain: | ||
| as: cucushift-installer-rehearse-aws-cases-zone-consistency-provision | ||
| steps: | ||
| - ref: cucushift-installer-rehearse-aws-cases-zone-consistency | ||
| documentation: |- | ||
| Run static validation tests for OCPBUGS-69923: Control plane machine zone allocation | ||
| consistency. This generates manifests using openshift-install and verifies that CAPI | ||
| and MAPI manifests have consistent zone allocation. No cluster provisioning or | ||
| deprovisioning is needed as these are configuration validation tests only. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you hit the issue describe in the bug using this script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we print error details? e.g. the zone in MAPI is us-east-1a, bug in CAPI its us-east-1b, and we can consider saving the relevant manifests in ARTIFACT dir, it will be helpful for the debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay let me improve this part.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit can be used to reproduce the problem:
I have a similar local script that can see the failure output: https://gist.github.com/liweinan/9d65abf9759370f141d4eec93fa7de17