Skip to content

Conversation

@tsivaprasad
Copy link
Contributor

@tsivaprasad tsivaprasad commented Jan 14, 2026

Summary

This PR exposes the etcd_mode in the Host API response and adds cluster-level test coverage to validate etcd server–client reconfiguration

Changes

  • Expose the current etcd_mode in the Host API response
  • Added test cases to cover etcd server ↔ client mode transitions

Testing

Verified etcd mode transitions by validating API responses before and after the mode change using the following commands:

  • restish control-plane-local-1 get-cluster
  • restish control-plane-local-1 list-hosts
  • restish control-plane-local-1 get-host

Example:

Current etcd_mode:

restish control-plane-local-1 get-host "host-4"
HTTP/1.1 200 OK
Content-Length: 1083
Content-Type: application/json
Date: Wed, 14 Jan 2026 19:26:28 GMT

{
  cohort: {
    control_available: true
    member_id: "waz93ar2kgev2tgqmdituc3i0"
    type: "swarm"
  }
  cpus: 14
  data_dir: "/Users/sivat/projects/control-plane/control-plane/docker/control-plane-dev/data/host-4"
  default_pgedge_version: {
    postgres_version: "18.1"
    spock_version: "5"
  }
  etcd_mode: "client"
  hostname: "docker-desktop"
  id: "host-4"
  ipv4_address: "192.168.64.2"
  memory: "8.2GB"
  orchestrator: "swarm"
  status: {
    components: {
      docker: {
        details: {
          containers: 10
          containers_paused: 0
          containers_running: 9
          containers_stopped: 1
          swarm.control_available: true
          swarm.error: ""
          swarm.local_node_state: "active"
        }
        error: ""
        healthy: true
      }
      etcd: {
        details: {
          alarms: []
        }
        error: ""
        healthy: true
      }
    }
    state: "healthy"
    updated_at: "2026-01-14T19:26:14Z"
  }
  supported_pgedge_versions: [
    {
      postgres_version: "16.10"
      spock_version: "5"
    }
    {
      postgres_version: "16.11"
      spock_version: "5"
    }
    {
      postgres_version: "17.6"
      spock_version: "5"
    }
    {
      postgres_version: "17.7"
      spock_version: "5"
    }
    {
      postgres_version: "18.0"
      spock_version: "5"
    }
    {
      postgres_version: "18.1"
      spock_version: "5"
    }
  ]
}

After etcd mode change:

restish control-plane-local-1 get-host "host-4"
HTTP/1.1 200 OK
Content-Length: 1097
Content-Type: application/json
Date: Wed, 14 Jan 2026 19:27:13 GMT

{
  cohort: {
    control_available: true
    member_id: "waz93ar2kgev2tgqmdituc3i0"
    type: "swarm"
  }
  cpus: 14
  data_dir: "/Users/sivat/projects/control-plane/control-plane/docker/control-plane-dev/data/host-4"
  default_pgedge_version: {
    postgres_version: "18.1"
    spock_version: "5"
  }
  etcd_mode: "server"
  hostname: "docker-desktop"
  id: "host-4"
  ipv4_address: "192.168.64.2"
  memory: "8.2GB"
  orchestrator: "swarm"
  status: {
    components: {
      docker: {
        details: {
          containers: 10
          containers_paused: 0
          containers_running: 9
          containers_stopped: 1
          swarm.control_available: true
          swarm.error: ""
          swarm.local_node_state: "active"
        }
        error: ""
        healthy: true
      }
      etcd: {
        details: {
          alarms: []
          errors: null
        }
        error: ""
        healthy: true
      }
    }
    state: "healthy"
    updated_at: "2026-01-14T19:27:00Z"
  }
  supported_pgedge_versions: [
    {
      postgres_version: "16.10"
      spock_version: "5"
    }
    {
      postgres_version: "16.11"
      spock_version: "5"
    }
    {
      postgres_version: "17.6"
      spock_version: "5"
    }
    {
      postgres_version: "17.7"
      spock_version: "5"
    }
    {
      postgres_version: "18.0"
      spock_version: "5"
    }
    {
      postgres_version: "18.1"
      spock_version: "5"
    }
  ]
}


Checklist

  • Tests added or updated
  • Changelog added

PLAT-394

Summary by CodeRabbit

  • New Features

    • Added etcd_mode field to Host API responses, supporting "server" and "client" modes.
  • Tests

    • Added cluster-level end-to-end tests covering etcd mode transitions (client→server and server→client) with health checks and validation of mode changes.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

Adds an EtcdMode/etcd_mode field to Host models and API, persists it to storage, and introduces cluster test utilities and end-to-end tests to exercise promoting/demoting hosts between etcd server and client modes.

Changes

Cohort / File(s) Summary
API Design & Examples
api/apiv1/design/host.go
Added public etcd_mode enum attribute to Host design and updated HostsArrayExample entries to include values (`"server"
API Serialization
server/internal/api/apiv1/convert.go
Populate new EtcdMode field on API Host objects during host->API conversion.
Internal Host Model
server/internal/host/host.go, server/internal/host/host_store.go
Added EtcdMode field to Host and StoredHost structs and wired through to/from storage (JSON etcd_mode).
Host Service Initialization
server/internal/host/service.go
Initialize Host.EtcdMode from service config when constructing/updating hosts.
Cluster Test infra
clustertest/cluster_test.go, clustertest/host_test.go
Added Cluster.RefreshClient(t); extended Host with dataDir, GetEtcdMode, and RecreateWithMode to support container restart and mode transitions.
Integration Tests
clustertest/etcd_mode_change_test.go
New tests: TestPromoteClientToServer and TestDemoteServerToClient, plus helpers countHealthyHosts and waitForHostsHealthy to validate cluster health during mode changes.
Changelog
changes/unreleased/Added-20260115-010344.yaml
Unreleased entry documenting etcd_mode addition and related cluster tests.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Suite
    participant Cluster
    participant Host
    participant Container as Container Runtime
    participant API as Control-Plane API

    Test->>Cluster: Create cluster (servers/clients with etcd_mode)
    Cluster->>Container: Start host containers with etcd_mode env
    Container-->>Cluster: Containers running

    Test->>Cluster: List hosts / check health
    Cluster->>API: List hosts
    API-->>Cluster: Hosts + EtcdMode + health

    Test->>Host: RecreateWithMode(newMode)
    Host->>Container: Stop container
    Container-->>Host: Stopped
    Host->>Container: Start container with new etcd_mode
    Container-->>Host: Running

    Test->>Cluster: RefreshClient()
    Cluster->>API: Recreate client connection
    API-->>Cluster: New client ready

    loop until healthy
        Cluster->>API: List hosts
        API-->>Cluster: Health status
    end

    Cluster-->>Test: Verify host modes and cluster health
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • jason-lynch

Poem

🐰 A hop to change a node's tune,
From client dusk to server moon,
Containers stop, then start anew,
Tests count hearts to prove it's true,
Etcd hops along — hooray, boop! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: exposing etcd_mode in the Host API and adding cluster-level tests for etcd reconfiguration.
Description check ✅ Passed The description includes all key required sections from the template: Summary, Changes, Testing with verified commands/outputs, Checklist with completed items, and issue link.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96f7073 and 324ee0c.

📒 Files selected for processing (1)
  • clustertest/host_test.go
🧰 Additional context used
📓 Path-based instructions (1)
clustertest/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Cluster integration tests should use build tag //go:build cluster_test and use testcontainers for Docker-based test environments

Files:

  • clustertest/host_test.go
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to clustertest/**/*.go : Cluster integration tests should use build tag `//go:build cluster_test` and use testcontainers for Docker-based test environments

Applied to files:

  • clustertest/host_test.go
🧬 Code graph analysis (1)
clustertest/host_test.go (3)
api/apiv1/design/host.go (1)
  • Host (79-137)
api/apiv1/gen/control_plane/service.go (1)
  • Host (511-536)
client/multi.go (1)
  • NewMultiServerClient (22-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (4)
clustertest/host_test.go (4)

69-80: LGTM!

Explicitly setting PGEDGE_ETCD_MODE for both server and client modes ensures consistent configuration and avoids relying on defaults.


114-152: LGTM!

The refactoring to use ctr avoids shadowing the container package import, and capturing h by reference in the cleanup closure enables proper cleanup of recreated containers. The design correctly ensures the cleanup terminates whichever container is currently assigned to h.container.


178-205: LGTM!

The optional client parameter pattern is well-documented and handles the case where a freshly recreated host may not be queryable directly. The implementation correctly searches for the host by ID and returns the etcd mode.


207-289: Well-designed container lifecycle management.

The implementation correctly handles the container reference update (lines 287-288), ensuring the cleanup registered in NewHost will terminate the current container (not the already-terminated original). The 60-second startup timeout appropriately accounts for the potentially longer initialization time during etcd mode transitions.

Note: The past review concern about double-cleanup has been addressed by updating h.container to point to the new container before the cleanup runs.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@clustertest/host_test.go`:
- Around line 286-305: The cleanup registered in NewHost can redundantly call
Terminate on a container already terminated by RecreateWithMode; update
RecreateWithMode to mark the original container as terminated (e.g., set a
boolean on the container or host like container.terminated or
h.containerTerminated) and change the cleanup closure to check that flag before
calling newContainer.Terminate(cleanupCtx); alternatively add and use an
IsTerminated() method on the container type and only call Terminate if
IsTerminated() returns false, ensuring a single deterministic termination and
avoiding duplicate log noise.
🧹 Nitpick comments (1)
clustertest/host_test.go (1)

205-268: Consider extracting shared container setup logic.

The container setup code (environment configuration, container request construction) is largely duplicated between NewHost and RecreateWithMode. This could be extracted into a helper function to reduce duplication and ensure consistency.

Additionally, note that ExtraEnv from the original HostConfig is not preserved during recreation. If this is intentional (since mode changes shouldn't carry over custom env vars), consider documenting this behavior. If not, you may need to store ExtraEnv in the Host struct.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5726ad9 and 96f7073.

⛔ Files ignored due to path filters (9)
  • api/apiv1/gen/control_plane/service.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/client/encode_decode.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/client/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/server/encode_decode.go is excluded by !**/gen/**
  • api/apiv1/gen/http/control_plane/server/types.go is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi.yaml is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.json is excluded by !**/gen/**
  • api/apiv1/gen/http/openapi3.yaml is excluded by !**/gen/**
📒 Files selected for processing (9)
  • api/apiv1/design/host.go
  • changes/unreleased/Added-20260115-010344.yaml
  • clustertest/cluster_test.go
  • clustertest/etcd_mode_change_test.go
  • clustertest/host_test.go
  • server/internal/api/apiv1/convert.go
  • server/internal/host/host.go
  • server/internal/host/host_store.go
  • server/internal/host/service.go
🧰 Additional context used
📓 Path-based instructions (4)
server/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

server/**/*.go: Use samber/do injector for dependency injection; each package should have a Provide() function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa

Files:

  • server/internal/host/host_store.go
  • server/internal/host/service.go
  • server/internal/api/apiv1/convert.go
  • server/internal/host/host.go
api/apiv1/design/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

API endpoints should be defined using Goa's DSL in api/apiv1/design/ with separate files for domain-specific types (database.go, host.go, cluster.go), then regenerated with make -C api generate

Files:

  • api/apiv1/design/host.go
server/internal/api/apiv1/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Implement generated Goa service interface methods in server/internal/api/apiv1/ after regenerating code

Files:

  • server/internal/api/apiv1/convert.go
clustertest/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

Cluster integration tests should use build tag //go:build cluster_test and use testcontainers for Docker-based test environments

Files:

  • clustertest/cluster_test.go
  • clustertest/etcd_mode_change_test.go
  • clustertest/host_test.go
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to api/apiv1/design/**/*.go : API endpoints should be defined using Goa's DSL in `api/apiv1/design/` with separate files for domain-specific types (database.go, host.go, cluster.go), then regenerated with `make -C api generate`

Applied to files:

  • api/apiv1/design/host.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking

Applied to files:

  • api/apiv1/design/host.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to clustertest/**/*.go : Cluster integration tests should use build tag `//go:build cluster_test` and use testcontainers for Docker-based test environments

Applied to files:

  • clustertest/etcd_mode_change_test.go
  • clustertest/host_test.go
🧬 Code graph analysis (4)
server/internal/api/apiv1/convert.go (2)
clustertest/host_test.go (1)
  • EtcdMode (27-27)
server/internal/utils/utils.go (1)
  • NillablePointerTo (76-82)
clustertest/cluster_test.go (1)
api/apiv1/gen/control_plane/service.go (1)
  • Cluster (205-212)
clustertest/etcd_mode_change_test.go (2)
clustertest/cluster_test.go (3)
  • NewCluster (28-47)
  • ClusterConfig (17-20)
  • Cluster (22-26)
clustertest/host_test.go (5)
  • HostConfig (34-38)
  • EtcdMode (27-27)
  • EtcdModeClient (31-31)
  • Host (40-45)
  • EtcdModeServer (30-30)
clustertest/host_test.go (3)
api/apiv1/design/host.go (1)
  • Host (79-137)
server/internal/host/host.go (1)
  • Host (38-51)
api/apiv1/gen/control_plane/service.go (1)
  • Host (511-536)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (16)
changes/unreleased/Added-20260115-010344.yaml (1)

1-3: LGTM!

The changelog entry accurately describes the feature addition and follows the expected format.

server/internal/host/host.go (3)

47-47: LGTM!

The EtcdMode field is correctly added to the Host struct using the config.EtcdMode type, maintaining consistency with the storage and API layers.


99-99: LGTM!

The EtcdMode is correctly propagated from storage to the domain model, following the established pattern for other fields.


148-148: LGTM!

The EtcdMode is correctly propagated to the storage layer, completing the bidirectional mapping.

server/internal/host/service.go (1)

41-57: LGTM!

The EtcdMode is correctly initialized from s.cfg.EtcdMode, following the same pattern used for other configuration-sourced fields like HostID, Orchestrator, and DataDir. This ensures hosts report their actual etcd mode when updating their registration.

server/internal/host/host_store.go (1)

26-26: LGTM!

The EtcdMode field is correctly added to StoredHost with the appropriate type and JSON tag, enabling persistence of the etcd mode in the storage layer.

api/apiv1/design/host.go (2)

158-158: LGTM!

The example data appropriately demonstrates a realistic cluster configuration with a mix of etcd server and client modes.

Also applies to: 197-197, 236-236


123-127: LGTM!

The etcd_mode attribute is well-defined with clear description, proper enum constraint ("server", "client"), and appropriate example value. The field is correctly optional (not in Required block), providing backward compatibility with existing hosts. API code has been regenerated per the Goa DSL process.

server/internal/api/apiv1/convert.go (1)

55-55: LGTM!

The EtcdMode field is correctly added using utils.NillablePointerTo, consistent with how other optional fields (like Cpus, Memory) are handled in this function. This will correctly return nil for the zero value or the mode string otherwise.

clustertest/cluster_test.go (1)

111-118: LGTM!

The RefreshClient method follows the established pattern used in Add() and Remove() methods. The documentation clearly explains its purpose, and the implementation is consistent with existing code.

clustertest/host_test.go (3)

43-43: LGTM!

Adding dataDir as a field enables the RecreateWithMode method to preserve cluster state by reusing the same data directory when recreating the container.


73-77: LGTM!

Explicitly setting PGEDGE_ETCD_MODE to "server" makes the behavior consistent with the EtcdModeClient case and ensures predictable configuration regardless of container defaults.


176-203: LGTM!

The GetEtcdMode method is well-documented, correctly handles the optional client parameter, and appropriately checks for nil EtcdMode since it's an optional field in the API response.

clustertest/etcd_mode_change_test.go (3)

13-55: LGTM!

The test is well-structured with clear setup, action, and verification phases. Using require for preconditions and assert for final verifications is the correct pattern. The logging at each step will aid in debugging test failures.


57-98: LGTM!

The demotion test mirrors the promotion test structure, providing good coverage for both transition directions. The test correctly validates that demoting a server to client maintains cluster health.


118-150: LGTM with minor suggestion.

The polling helper is well-implemented with appropriate logging and error handling.

Consider checking t.Context().Done() in the loop to allow early exit if the test context is canceled, though this is a minor optimization since t.Fatalf will eventually terminate the test anyway.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants