-
Notifications
You must be signed in to change notification settings - Fork 14
Add PersistentVolume retention, security mount options, and PVC recovery support #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add PersistentVolume retention, security mount options, and PVC recovery support #207
Conversation
f1375f0 to
20aff96
Compare
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
78c3443 to
3107300
Compare
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for retaining PersistentVolumes (PVs) after DocumentDB cluster deletion and enables restoration from retained PVCs. It introduces a new PersistentVolumeReconciler controller that manages PV reclaim policies and security mount options based on DocumentDB configuration.
Changes:
- Added
PersistentVolumeReclaimPolicyfield to DocumentDB storage spec (Retain/Delete options) - Added
PVCfield to recovery configuration for bootstrapping from existing PVCs - Implemented PersistentVolumeReconciler to automatically update PV reclaim policies and apply security mount options (nodev, nosuid, noexec)
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| operator/src/internal/controller/pv_controller.go | New controller that reconciles PVs associated with DocumentDB clusters, updating reclaim policies and mount options |
| operator/src/internal/controller/pv_controller_test.go | Comprehensive unit tests for the PV controller including predicates, ownership chain traversal, and reconciliation logic |
| operator/src/internal/cnpg/cnpg_cluster.go | Updated bootstrap logic to support PVC recovery in addition to backup recovery |
| operator/src/internal/cnpg/cnpg_cluster_test.go | Tests for PVC recovery bootstrap configuration |
| operator/src/internal/cnpg/suite_test.go | Test suite setup for CNPG package |
| operator/src/api/preview/documentdb_types.go | API additions for PVC recovery and PersistentVolumeReclaimPolicy field with CEL validation |
| operator/src/api/preview/zz_generated.deepcopy.go | Auto-generated deepcopy code for new PVC field |
| operator/src/config/crd/bases/documentdb.io_dbs.yaml | CRD updates with PVC recovery and reclaim policy fields |
| operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml | Corresponding CRD updates for Helm chart |
| operator/src/config/rbac/role.yaml | Added RBAC permissions for PV and PVC resources |
| operator/documentdb-helm-chart/templates/05_clusterrole.yaml | Added cluster-level PV permissions |
| operator/src/cmd/main.go | Integrated PersistentVolumeReconciler into the manager |
| docs/operator-public-documentation/preview/backup-and-restore.md | Documentation for PV retention and recovery workflow |
| docs/operator-public-documentation/preview/advanced-configuration/README.md | Added sections on PV security mount options and disk encryption best practices |
| .github/workflows/test-backup-and-restore.yml | Integration tests for PV retention and PVC-based recovery |
docs/operator-public-documentation/preview/advanced-configuration/README.md
Show resolved
Hide resolved
- Default to Retain for safer database workloads (data preservation) - Add comprehensive documentation for when to use each option - Add WARNING about data loss when using Delete - Update controller default fallback to Retain - Update tests to reflect new default behavior Signed-off-by: wenting <wentingwu@microsoft.com>
- Move 'Verify mount options are set by PV controller' test to E2E - Update 'Test PV reclaim policy default and explicit Delete' to: - Verify default Retain policy on existing cluster - Patch cluster to Delete policy and verify PV update - Verify PV cleanup after cluster deletion Signed-off-by: wenting <wentingwu@microsoft.com>
- Optimize findPVsForDocumentDB with CNPG cluster label selector - Use GitHub Actions outputs instead of temp files in workflow - Add error scenario tests for PV update and API failures - Improve CEL validation for RecoveryConfiguration - Clarify PV deletion chain in documentation - Add compatibility notes for security mount options Signed-off-by: wenting <wentingwu@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
operator/src/internal/controller/pv_controller_test.go:1
- These predicate functions are defined inline and always return false. Consider extracting them into a named constant or helper function for better reusability and clarity, especially since the pattern is used in multiple predicates.
// Copyright (c) Microsoft Corporation.
| pvcList := &corev1.PersistentVolumeClaimList{} | ||
| if err := r.List(ctx, pvcList, | ||
| client.InNamespace(documentdb.Namespace), | ||
| client.MatchingLabels{cnpgClusterLabel: documentdb.Name}, |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant cnpgClusterLabel is used here but is defined in documentdb_controller.go (line 49). Consider defining this constant in a shared location or in this file to improve maintainability and reduce coupling between controllers.
| Scheme *runtime.Scheme | ||
| Config *rest.Config | ||
| Clientset kubernetes.Interface | ||
| Recorder record.EventRecorder |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Recorder field should have a documentation comment explaining its purpose for emitting Kubernetes events, especially since it's used for PV retention warnings during deletion.
| Recorder record.EventRecorder | |
| // Recorder emits Kubernetes events for this controller, including PV retention warnings during deletion. | |
| Recorder record.EventRecorder |
| Name: pvcName, | ||
| Kind: "PersistentVolumeClaim", |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining why APIGroup is set to an empty string for PersistentVolumeClaim. This is because PVC is a core Kubernetes resource (in the "" API group), and this clarification would help future maintainers.
| Name: pvcName, | |
| Kind: "PersistentVolumeClaim", | |
| Name: pvcName, | |
| Kind: "PersistentVolumeClaim", | |
| // PersistentVolumeClaim is a core Kubernetes resource and therefore uses the empty ("") API group. |
| // Create a new recorder to ensure it's empty | ||
| localRecorder := record.NewFakeRecorder(10) |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states the recorder is created 'to ensure it's empty', but the actual validation on line 572 uses Consistently which doesn't verify emptiness but rather that no events are received during a time window. Consider clarifying the comment to match the actual test behavior: 'Create a dedicated recorder to verify no events are emitted'.
| - ReadWriteOnce | ||
| resources: | ||
| requests: | ||
| storage: 10Gi # Must match the PV capacity |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says storage must match PV capacity, but this requirement could be more flexible in practice - Kubernetes allows requesting less storage than the PV provides. Consider clarifying whether this is a hard requirement for this recovery scenario or if it's a best practice recommendation.
| # Store PV name for later steps using GitHub Actions output (more robust than temp files) | ||
| echo "pv_name=$pv_name" >> $GITHUB_OUTPUT |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good use of GitHub Actions outputs instead of temporary files. However, consider adding error handling to verify that pv_name is not empty before writing to GITHUB_OUTPUT to prevent silent failures in subsequent steps.
| # Wait a bit for PV to be deleted (the storage class handles actual deletion) | ||
| echo "Waiting for PV to be deleted..." | ||
| sleep 30 | ||
|
|
||
| # Verify PV was deleted (because reclaim policy is Delete) | ||
| pv_exists=$(kubectl get pv $pv_name --ignore-not-found 2>/dev/null) | ||
| if [ -z "$pv_exists" ]; then | ||
| echo "✓ PV $pv_name was deleted as expected (Delete policy)" | ||
| else | ||
| pv_status=$(kubectl get pv $pv_name -o jsonpath='{.status.phase}') | ||
| echo "⚠️ PV $pv_name still exists with status: $pv_status" | ||
| echo "Note: PV deletion depends on the storage provisioner. The reclaim policy was correctly set to Delete." |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test treats PV persistence after deletion as a warning rather than a failure. This could mask real issues with the Delete policy implementation. Consider adding a timeout with eventual consistency check or making this a failure if the PV still exists after a reasonable wait period (e.g., 60 seconds).
| # Wait a bit for PV to be deleted (the storage class handles actual deletion) | |
| echo "Waiting for PV to be deleted..." | |
| sleep 30 | |
| # Verify PV was deleted (because reclaim policy is Delete) | |
| pv_exists=$(kubectl get pv $pv_name --ignore-not-found 2>/dev/null) | |
| if [ -z "$pv_exists" ]; then | |
| echo "✓ PV $pv_name was deleted as expected (Delete policy)" | |
| else | |
| pv_status=$(kubectl get pv $pv_name -o jsonpath='{.status.phase}') | |
| echo "⚠️ PV $pv_name still exists with status: $pv_status" | |
| echo "Note: PV deletion depends on the storage provisioner. The reclaim policy was correctly set to Delete." | |
| # Wait (with timeout) for PV to be deleted (the storage class handles actual deletion) | |
| echo "Waiting for PV $pv_name to be deleted (up to 60 seconds)..." | |
| MAX_PV_WAIT_SECONDS=60 | |
| PV_POLL_INTERVAL=5 | |
| ELAPSED=0 | |
| while [ $ELAPSED -lt $MAX_PV_WAIT_SECONDS ]; do | |
| pv_exists=$(kubectl get pv $pv_name --ignore-not-found 2>/dev/null) | |
| if [ -z "$pv_exists" ]; then | |
| echo "✓ PV $pv_name was deleted as expected (Delete policy)" | |
| break | |
| fi | |
| echo "PV $pv_name still exists. Waiting..." | |
| sleep $PV_POLL_INTERVAL | |
| ELAPSED=$((ELAPSED + PV_POLL_INTERVAL)) | |
| done | |
| # After waiting, fail the test if the PV still exists | |
| pv_exists=$(kubectl get pv $pv_name --ignore-not-found 2>/dev/null) | |
| if [ -n "$pv_exists" ]; then | |
| pv_status=$(kubectl get pv $pv_name -o jsonpath='{.status.phase}') | |
| echo "❌ PV $pv_name still exists after ${MAX_PV_WAIT_SECONDS}s with status: $pv_status" | |
| echo "PV should be deleted when reclaim policy is Delete. Investigate storage provisioner or policy configuration." | |
| exit 1 |
This PR adds support for retaining PersistentVolumes after DocumentDB cluster deletion, enables recovery from retained PVs, and applies security-hardening mount options to all DocumentDB-associated PVs.
Key Features:
persistentVolumeReclaimPolicy: Retainto preserve PersistentVolumes after cluster deletion. This enables disaster recovery scenarios where data can be recovered even after accidental cluster deletion.