-
Notifications
You must be signed in to change notification settings - Fork 2
feat(etcd): expose etcd_mode in host API and add cluster tests #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(etcd): expose etcd_mode in host API and add cluster tests #245
Conversation
📝 WalkthroughWalkthroughAdds an EtcdMode/etcd_mode field to Host models and API, persists it to storage, and introduces cluster test utilities and end-to-end tests to exercise promoting/demoting hosts between etcd server and client modes. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Suite
participant Cluster
participant Host
participant Container as Container Runtime
participant API as Control-Plane API
Test->>Cluster: Create cluster (servers/clients with etcd_mode)
Cluster->>Container: Start host containers with etcd_mode env
Container-->>Cluster: Containers running
Test->>Cluster: List hosts / check health
Cluster->>API: List hosts
API-->>Cluster: Hosts + EtcdMode + health
Test->>Host: RecreateWithMode(newMode)
Host->>Container: Stop container
Container-->>Host: Stopped
Host->>Container: Start container with new etcd_mode
Container-->>Host: Running
Test->>Cluster: RefreshClient()
Cluster->>API: Recreate client connection
API-->>Cluster: New client ready
loop until healthy
Cluster->>API: List hosts
API-->>Cluster: Health status
end
Cluster-->>Test: Verify host modes and cluster health
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)clustertest/**/*.go📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (2)📓 Common learnings📚 Learning: 2026-01-14T16:43:14.333ZApplied to files:
🧬 Code graph analysis (1)clustertest/host_test.go (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (4)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@clustertest/host_test.go`:
- Around line 286-305: The cleanup registered in NewHost can redundantly call
Terminate on a container already terminated by RecreateWithMode; update
RecreateWithMode to mark the original container as terminated (e.g., set a
boolean on the container or host like container.terminated or
h.containerTerminated) and change the cleanup closure to check that flag before
calling newContainer.Terminate(cleanupCtx); alternatively add and use an
IsTerminated() method on the container type and only call Terminate if
IsTerminated() returns false, ensuring a single deterministic termination and
avoiding duplicate log noise.
🧹 Nitpick comments (1)
clustertest/host_test.go (1)
205-268: Consider extracting shared container setup logic.The container setup code (environment configuration, container request construction) is largely duplicated between
NewHostandRecreateWithMode. This could be extracted into a helper function to reduce duplication and ensure consistency.Additionally, note that
ExtraEnvfrom the originalHostConfigis not preserved during recreation. If this is intentional (since mode changes shouldn't carry over custom env vars), consider documenting this behavior. If not, you may need to storeExtraEnvin theHoststruct.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (9)
api/apiv1/gen/control_plane/service.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/client/encode_decode.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/client/types.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/server/encode_decode.gois excluded by!**/gen/**api/apiv1/gen/http/control_plane/server/types.gois excluded by!**/gen/**api/apiv1/gen/http/openapi.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi.yamlis excluded by!**/gen/**api/apiv1/gen/http/openapi3.jsonis excluded by!**/gen/**api/apiv1/gen/http/openapi3.yamlis excluded by!**/gen/**
📒 Files selected for processing (9)
api/apiv1/design/host.gochanges/unreleased/Added-20260115-010344.yamlclustertest/cluster_test.goclustertest/etcd_mode_change_test.goclustertest/host_test.goserver/internal/api/apiv1/convert.goserver/internal/host/host.goserver/internal/host/host_store.goserver/internal/host/service.go
🧰 Additional context used
📓 Path-based instructions (4)
server/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
server/**/*.go: Usesamber/doinjector for dependency injection; each package should have aProvide()function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa
Files:
server/internal/host/host_store.goserver/internal/host/service.goserver/internal/api/apiv1/convert.goserver/internal/host/host.go
api/apiv1/design/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
API endpoints should be defined using Goa's DSL in
api/apiv1/design/with separate files for domain-specific types (database.go, host.go, cluster.go), then regenerated withmake -C api generate
Files:
api/apiv1/design/host.go
server/internal/api/apiv1/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Implement generated Goa service interface methods in
server/internal/api/apiv1/after regenerating code
Files:
server/internal/api/apiv1/convert.go
clustertest/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
Cluster integration tests should use build tag
//go:build cluster_testand use testcontainers for Docker-based test environments
Files:
clustertest/cluster_test.goclustertest/etcd_mode_change_test.goclustertest/host_test.go
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to api/apiv1/design/**/*.go : API endpoints should be defined using Goa's DSL in `api/apiv1/design/` with separate files for domain-specific types (database.go, host.go, cluster.go), then regenerated with `make -C api generate`
Applied to files:
api/apiv1/design/host.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
Applied to files:
api/apiv1/design/host.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to clustertest/**/*.go : Cluster integration tests should use build tag `//go:build cluster_test` and use testcontainers for Docker-based test environments
Applied to files:
clustertest/etcd_mode_change_test.goclustertest/host_test.go
🧬 Code graph analysis (4)
server/internal/api/apiv1/convert.go (2)
clustertest/host_test.go (1)
EtcdMode(27-27)server/internal/utils/utils.go (1)
NillablePointerTo(76-82)
clustertest/cluster_test.go (1)
api/apiv1/gen/control_plane/service.go (1)
Cluster(205-212)
clustertest/etcd_mode_change_test.go (2)
clustertest/cluster_test.go (3)
NewCluster(28-47)ClusterConfig(17-20)Cluster(22-26)clustertest/host_test.go (5)
HostConfig(34-38)EtcdMode(27-27)EtcdModeClient(31-31)Host(40-45)EtcdModeServer(30-30)
clustertest/host_test.go (3)
api/apiv1/design/host.go (1)
Host(79-137)server/internal/host/host.go (1)
Host(38-51)api/apiv1/gen/control_plane/service.go (1)
Host(511-536)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (16)
changes/unreleased/Added-20260115-010344.yaml (1)
1-3: LGTM!The changelog entry accurately describes the feature addition and follows the expected format.
server/internal/host/host.go (3)
47-47: LGTM!The
EtcdModefield is correctly added to theHoststruct using theconfig.EtcdModetype, maintaining consistency with the storage and API layers.
99-99: LGTM!The
EtcdModeis correctly propagated from storage to the domain model, following the established pattern for other fields.
148-148: LGTM!The
EtcdModeis correctly propagated to the storage layer, completing the bidirectional mapping.server/internal/host/service.go (1)
41-57: LGTM!The
EtcdModeis correctly initialized froms.cfg.EtcdMode, following the same pattern used for other configuration-sourced fields likeHostID,Orchestrator, andDataDir. This ensures hosts report their actual etcd mode when updating their registration.server/internal/host/host_store.go (1)
26-26: LGTM!The
EtcdModefield is correctly added toStoredHostwith the appropriate type and JSON tag, enabling persistence of the etcd mode in the storage layer.api/apiv1/design/host.go (2)
158-158: LGTM!The example data appropriately demonstrates a realistic cluster configuration with a mix of etcd server and client modes.
Also applies to: 197-197, 236-236
123-127: LGTM!The
etcd_modeattribute is well-defined with clear description, proper enum constraint ("server","client"), and appropriate example value. The field is correctly optional (not inRequiredblock), providing backward compatibility with existing hosts. API code has been regenerated per the Goa DSL process.server/internal/api/apiv1/convert.go (1)
55-55: LGTM!The
EtcdModefield is correctly added usingutils.NillablePointerTo, consistent with how other optional fields (likeCpus,Memory) are handled in this function. This will correctly returnnilfor the zero value or the mode string otherwise.clustertest/cluster_test.go (1)
111-118: LGTM!The
RefreshClientmethod follows the established pattern used inAdd()andRemove()methods. The documentation clearly explains its purpose, and the implementation is consistent with existing code.clustertest/host_test.go (3)
43-43: LGTM!Adding
dataDiras a field enables theRecreateWithModemethod to preserve cluster state by reusing the same data directory when recreating the container.
73-77: LGTM!Explicitly setting
PGEDGE_ETCD_MODEto"server"makes the behavior consistent with theEtcdModeClientcase and ensures predictable configuration regardless of container defaults.
176-203: LGTM!The
GetEtcdModemethod is well-documented, correctly handles the optional client parameter, and appropriately checks for nilEtcdModesince it's an optional field in the API response.clustertest/etcd_mode_change_test.go (3)
13-55: LGTM!The test is well-structured with clear setup, action, and verification phases. Using
requirefor preconditions andassertfor final verifications is the correct pattern. The logging at each step will aid in debugging test failures.
57-98: LGTM!The demotion test mirrors the promotion test structure, providing good coverage for both transition directions. The test correctly validates that demoting a server to client maintains cluster health.
118-150: LGTM with minor suggestion.The polling helper is well-implemented with appropriate logging and error handling.
Consider checking
t.Context().Done()in the loop to allow early exit if the test context is canceled, though this is a minor optimization sincet.Fatalfwill eventually terminate the test anyway.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary
This PR exposes the
etcd_modein the Host API response and adds cluster-level test coverage to validate etcd server–client reconfigurationChanges
etcd_modein the Host API responseTesting
Verified etcd mode transitions by validating API responses before and after the mode change using the following commands:
restish control-plane-local-1 get-clusterrestish control-plane-local-1 list-hostsrestish control-plane-local-1 get-hostExample:
Current etcd_mode:
After etcd mode change:
Checklist
PLAT-394
Summary by CodeRabbit
New Features
etcd_modefield to Host API responses, supporting "server" and "client" modes.Tests
✏️ Tip: You can customize this high-level summary in your review settings.