-
Notifications
You must be signed in to change notification settings - Fork 74
[main] Fix metadata-webhook cleanup race condition #3974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[main] Fix metadata-webhook cleanup race condition #3974
Conversation
Add namespaceSelector to the MutatingWebhookConfiguration to limit the webhook's scope to namespaces with the samples.knative.dev/release label. This prevents the webhook from blocking resource deletions in other namespaces when the serving-tests namespace is torn down. The issue occurred during upgrade test cleanup where the Route resource for deployment-upgrade-failure could not be deleted because the webhook service was unavailable after namespace cleanup started. Assisted-by: 🤖 Claude Opus/Sonnet 4.5
Delete webhook resources before namespace deletion to prevent blocking Route finalizer removal when webhook service is unavailable. The issue occurs when: 1. Tests complete and cleanup starts 2. 'kubectl delete ns serving-tests' begins namespace deletion 3. Routes have finalizers that need removal 4. Finalizer removal triggers the MutatingWebhookConfiguration 5. Webhook service (in serving-tests) is already being deleted 6. Webhook call times out, blocking namespace deletion Solution: Delete the webhook resources (including the cluster-scoped MutatingWebhookConfiguration) before deleting the serving-tests namespace. This mirrors the installation order and prevents the race condition.
The webhook config directory includes 100-namespace.yaml which deletes the serving-tests namespace. Adding --ignore-not-found prevents the error when the namespace is already deleted.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cardil, openshift-cherrypick-robot The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@openshift-cherrypick-robot: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test 420-kitchensink-e2e Infra issues |
This is an automated cherry-pick of #3973
/assign cardil