-
Notifications
You must be signed in to change notification settings - Fork 14
Add DocumentDB upgrade support with configurable PostgresImage and ImageVolume extensions #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add DocumentDB upgrade support with configurable PostgresImage and ImageVolume extensions #208
Conversation
433e80d to
1e6c870
Compare
1e6c870 to
ed68947
Compare
| nodeCount: ${{ inputs.node-count }} | ||
| instancesPerNode: ${{ inputs.instances-per-node }} | ||
| documentDBImage: ghcr.io/microsoft/documentdb/documentdb-local:16 | ||
| documentDBImage: ghcr.io/guanzhousongmicrosoft/documentdb-pg18-amd64:0.110.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See it's a draft. Just in case, we should not use any personal repo link.
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
ab04d57 to
1cc45ff
Compare
Signed-off-by: wenting <wentingwu@microsoft.com>
…stgres images Signed-off-by: wenting <wentingwu@microsoft.com>
1ae514c to
35b47ca
Compare
Signed-off-by: wenting <wentingwu@microsoft.com>
006724a to
a61d520
Compare
Signed-off-by: wenting <wentingwu@microsoft.com>
Signed-off-by: wenting <wentingwu@microsoft.com>
34f29f2 to
b2836ec
Compare
b9d97be to
5144126
Compare
Signed-off-by: wenting <wentingwu@microsoft.com>
5144126 to
d6db610
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces DocumentDB upgrade capabilities by adding support for configurable PostgreSQL and DocumentDB extension images, along with automated extension upgrade logic. The changes enable rolling upgrades of DocumentDB clusters with minimal downtime.
Changes:
- Adds
postgresImagefield to DocumentDB CRD spec for configurable Postgres base image selection - Adds
documentDBVersionfield to DocumentDB status to track the currently installed extension version - Implements automated extension upgrade logic that updates the extension image and runs
ALTER EXTENSIONwhen needed - Upgrades CNPG dependency from 0.26.1 to 0.27.0 to support ImageVolume-based extension loading
- Adds E2E upgrade tests that verify version transitions
- Removes DocumentDB image build from CI pipeline (now using external registry)
- Migrates certificate tests from testify to Ginkgo/Gomega for consistency
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| operator/src/api/preview/documentdb_types.go | Adds PostgresImage spec field and DocumentDBVersion status field to DocumentDB CRD |
| operator/src/config/crd/bases/documentdb.io_dbs.yaml | Updates CRD schema with new postgresImage and documentDBVersion fields |
| operator/documentdb-helm-chart/crds/documentdb.io_dbs.yaml | Mirrors CRD schema changes in Helm chart |
| operator/src/internal/cnpg/cnpg_cluster.go | Configures CNPG cluster with ImageVolume extensions and uses postgresImage from spec |
| operator/src/internal/controller/documentdb_controller.go | Adds upgrade logic with upgradeDocumentDBExtensionIfNeeded and updateExtensionImageIfNeeded functions, plus version parsing |
| operator/src/internal/controller/documentdb_controller_test.go | Comprehensive unit tests for upgrade logic using Ginkgo/Gomega |
| operator/src/internal/controller/certificate_controller_test.go | Extracts certificate tests into separate file, maintaining testify framework |
| operator/documentdb-helm-chart/Chart.yaml | Bumps CNPG dependency from 0.26.1 to 0.27.0 |
| .github/workflows/test-E2E.yml | Adds upgrade test scenario that patches image and verifies version transition |
| .github/workflows/test-integration.yml | Updates to use hardcoded DocumentDB image instead of version parameter |
| .github/workflows/test-backup-and-restore.yml | Removes unused documentdb_version input parameter |
| .github/workflows/test-build-and-package.yml | Removes DocumentDB image build step (now using external images) |
| .github/actions/setup-test-environment/action.yml | Enables ImageVolume feature gate in kind cluster config, upgrades to kind v0.31.0 with K8s v1.35.0 |
| // Check if documentdb extension needs to be upgraded (image update + ALTER EXTENSION) | ||
| if err := r.upgradeDocumentDBExtensionIfNeeded(ctx, currentCnpgCluster, desiredCnpgCluster, documentdb); err != nil { | ||
| logger.Error(err, "Failed to upgrade DocumentDB extension") | ||
| return ctrl.Result{RequeueAfter: RequeueAfterShort}, nil | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential race condition here. Between lines 234-259, the documentdb status is updated with cluster phase and connection string. Then at line 262, upgradeDocumentDBExtensionIfNeeded is called which also updates the documentdb status at lines 550-555 and 578-581. If the status was modified by another reconcile loop between these calls, the second update could overwrite changes. Consider refetching the documentdb object before the upgrade check, or better yet, consolidate all status updates to happen once at the end of the reconcile loop.
| return ctrl.Result{RequeueAfter: RequeueAfterShort}, nil | ||
| } | ||
|
|
||
| // Don't reque again unless there is a change |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: "reque" should be "requeue".
| // Don't reque again unless there is a change | |
| // Don't requeue again unless there is a change |
| // Step 6: Update DocumentDB version in status after upgrade | ||
| documentdb.Status.DocumentDBVersion = defaultVersion | ||
| if err := r.Status().Update(ctx, documentdb); err != nil { | ||
| logger.Error(err, "Failed to update DocumentDB status after extension upgrade") | ||
| } |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status update failure should cause the function to return an error rather than just logging it. If the status update fails after successfully upgrading the extension, the operator will lose track of the current version and may attempt to upgrade again unnecessarily, or the status will be inconsistent with reality.
| spec := cnpgv1.ClusterSpec{ | ||
| Instances: documentdb.Spec.InstancesPerNode, | ||
| ImageName: documentdb_image, | ||
| ImageName: documentdb.Spec.PostgresImage, |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a defensive default value check for PostgresImage to match the pattern used elsewhere in the codebase. While the kubebuilder default should handle this, adding a runtime check like cmp.Or(documentdb.Spec.PostgresImage, "ghcr.io/cloudnative-pg/postgresql:18-standard-bookworm") on line 59 would provide additional safety against potential edge cases where the default might not be applied (e.g., when migrating from older CRD versions).
| ImageName: documentdb.Spec.PostgresImage, | |
| ImageName: cmp.Or(documentdb.Spec.PostgresImage, "ghcr.io/cloudnative-pg/postgresql:18-standard-bookworm"), |
| # Extract expected version from image tag (e.g., "0.110.0" -> "0.110-0") | ||
| NEW_VERSION_TAG="${NEW_IMAGE##*:}" | ||
| EXPECTED_VERSION=$(echo "$NEW_VERSION_TAG" | sed 's/\.\([^.]*\)$/-\1/') |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version parsing logic using sed 's/\.\([^.]*\)$/-\1/' assumes a specific format where the last dot-separated component should be converted to a hyphen-separated component (e.g., "0.110.0" -> "0.110-0"). This is fragile and may not work correctly for all version formats. For example, if the image tag includes a pre-release version like "0.110.0-beta", the parsing would produce "0.110.0-beta" (unchanged), not matching the expected format. Consider documenting the expected version format or making the parsing more robust.
| # Extract expected version from image tag (e.g., "0.110.0" -> "0.110-0") | |
| NEW_VERSION_TAG="${NEW_IMAGE##*:}" | |
| EXPECTED_VERSION=$(echo "$NEW_VERSION_TAG" | sed 's/\.\([^.]*\)$/-\1/') | |
| # Extract expected version from image tag: | |
| # - For semver-like tags: "0.110.0" -> "0.110-0" | |
| # - With pre-release: "0.110.0-beta" -> "0.110-0-beta" | |
| NEW_VERSION_TAG="${NEW_IMAGE##*:}" | |
| if [[ "$NEW_VERSION_TAG" =~ ^([0-9]+\.[0-9]+)\.([0-9]+)(-.+)?$ ]]; then | |
| EXPECTED_VERSION="${BASH_REMATCH[1]}-${BASH_REMATCH[2]}${BASH_REMATCH[3]}" | |
| else | |
| echo "⚠️ Unable to parse version from image tag '$NEW_VERSION_TAG'; using tag as-is for comparison" | |
| EXPECTED_VERSION="$NEW_VERSION_TAG" | |
| fi |
| if err := r.Status().Update(ctx, documentdb); err != nil { | ||
| logger.Error(err, "Failed to update DocumentDB status with extension version") |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status update at line 552 should not be silently ignored if it fails. When the status update fails, the operator should requeue to ensure the version is eventually recorded. Currently, the error is only logged but the function continues, which could lead to inconsistent state where the extension version in the database differs from what's recorded in the status.
| if err := r.Status().Update(ctx, documentdb); err != nil { | |
| logger.Error(err, "Failed to update DocumentDB status with extension version") | |
| if err := r.Status().Update(ctx, documentdb); err != nil { | |
| err = fmt.Errorf("failed to update DocumentDB status with extension version: %w", err) | |
| logger.Error(err, "Failed to update DocumentDB status with extension version") | |
| return err |
13fad47 to
d6db610
Compare
Summary
This PR introduces DocumentDB upgrade capabilities by adding a configurable
PostgresImagefield to the DocumentDB CRD and enabling ImageVolume-based extension loading through CNPG.Key Changes
CRD Updates
postgresImagefield to DocumentDB spec for configurable postgres image selectiondocumentDBVersionstatus field to track the current versionCNPG Integration
LD_LIBRARY_PATHfor extension loadinginitdb: could not look up effective user ID 105: user does not existerrorsCI/CD Updates
ImageVolumefeature gate in kind cluster configurationTesting
documentdb_controller_test.go