Skip to content

Conversation

@jason-lynch
Copy link
Member

@jason-lynch jason-lynch commented Jan 14, 2026

Summary

Adds an election package that expands on the core logic from our custom scheduler elector implementation. Callers can use the election service to create a candidate, which has functions to participate in a named election.

Changes

Refactors the election implementation from the scheduler package into a standalone election package and adds some additional edge case checks and tests.

Testing

This PR shouldn't cause any noticeable change in functionality, aside from improved handling of certain types of outages. You can inspect the election and scheduler output by enabling debug logging and watching for messages with component=election_candidate:

LOG_LEVEL=debug make dev-watch

The scheduler is currently only used for scheduled backup jobs. Assuming you're using the dev-env.zsh helpers, you can test this with a database like:

# Make sure your cluster is initialized
cp-init

# Make a directory for backups
mkdir -p .scratch/test-backups

# Make a create database request body
# Note that this is using a shell substitution to fill in the `host_path`. Make sure you do this yourself
# if you decide to copy this JSON to a file rather than running this command as written.
cat <<EOF > ./.scratch/db-with-backups.json
{
  "id": "storefront",
  "spec": {
    "database_name": "storefront",
    "database_users": [
      {
        "username": "admin",
        "password": "password",
        "db_owner": true,
        "attributes": ["SUPERUSER", "LOGIN"]
      }
    ],
    "orchestrator_opts": {
      "swarm": {
        "extra_volumes": [
          {
            "host_path": "$(pwd)/.scratch/local-backups",
            "destination_path": "/backups"
          }
        ]
      }
    },
    "backup_config": {
      "repositories": [
        {
          "id": "scheduler-test",
          "type": "posix",
          "base_path": "/backups"
        }
      ],
      "schedules": [
        {
          "id": "every-five-minutes",
          "type": "full",
          "cron_expression": "*/5 * * * *"
        }
      ]
    },
    "nodes": [
      { "name": "n1", "host_ids": ["host-1"] },
      { "name": "n2", "host_ids": ["host-2"] }
    ]
  }
}
EOF

# Submit the create database request and follow its progress.
cp1-req create-database < ./.scratch/db-with-backups.json | cp-follow-task

Then, you should see the backup running every 5 minutes in the control-plane server logs, and it should only run on the host that won the election. Note that you will see registered scheduled job messages from every host because they keep their internal state up-to-date even if they're not running the jobs. You can try stopping that leader host with:

cp-docker-compose stop <host ID>

And you should see another host take over the leadership.

PLAT-347

Summary by CodeRabbit

  • New Features

    • Added a distributed election system with candidates, TTL-based renewal, leadership watching, and claim callbacks.
  • Refactor

    • Scheduler leadership now delegates to the new election candidate abstraction, removing legacy leader-management logic.
  • Tests

    • Added tests for candidate lifecycle, leadership transfer, and start/stop idempotence.
  • Chores

    • Integrated election components into application initialization and dependency wiring.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

📝 Walkthrough

Walkthrough

Adds an etcd-backed election subsystem (store, service, Candidate) with TTL-based leadership claims, DI wiring and tests, and refactors the scheduler Elector to delegate leadership responsibilities to election.Candidate. Also wires election.Provide into server initialization.

Changes

Cohort / File(s) Summary
Election Store & Types
server/internal/election/store.go
Adds Name, StoredElection, and ElectionStore plus key construction and storage ops: GetByKey, Create, Update, Delete, Watch (etcd-backed).
Election Candidate & Logic
server/internal/election/candidate.go
New Candidate type and ClaimHandler; implements Start/Stop, IsLeader, Error, AddHandlers, TTL renewal (ticker at ttl/3), watch loop, claim/attempt/release logic, concurrency controls, and error channel.
Election Service & DI
server/internal/election/service.go, server/internal/election/provide.go, server/cmd/root.go
Adds Service with NewCandidate, DI Provide(i *do.Injector) wiring for store and service, and registers election.Provide in root initialization (imported and invoked after docker.Provide(i)).
Scheduler Integration
server/internal/scheduler/elector.go, server/internal/scheduler/provide.go
Replaces elector internal leadership logic with delegation to election.Candidate; Provide now creates a Candidate via election.Service (electionName="scheduler", TTL=30s) and constructs Elector with that Candidate.
Tests
server/internal/election/candidate_test.go
Adds tests for Candidate: Start/Stop idempotence, IsLeader transitions, multi-candidate leadership transfer and callback signaling using an etcd test server.

Sequence Diagram

sequenceDiagram
    participant Scheduler as Scheduler/Elector
    participant Service as election.Service
    participant Candidate as election.Candidate
    participant Store as ElectionStore
    participant Etcd as etcd

    Scheduler->>Service: NewCandidate(name, id, ttl, onClaim)
    Service-->>Scheduler: Candidate

    Scheduler->>Candidate: Start(ctx)
    Candidate->>Store: GetByKey(name)
    Store->>Etcd: GET key
    Etcd-->>Store: record / nil
    alt no leader
        Candidate->>Store: Create(StoredElection) (with lease/TTL)
        Store->>Etcd: PUT key
        Etcd-->>Candidate: success
        Candidate->>Candidate: set isLeader=true
        Candidate->>Scheduler: invoke onClaim (async)
    else leader exists
        Candidate->>Candidate: remain non-leader
    end

    par Renewal Loop
        loop every ttl/3
            Candidate->>Store: Update(StoredElection) (if leader)
            Store->>Etcd: refresh TTL
        end
    and Watch Loop
        Candidate->>Store: Watch(name)
        Store->>Etcd: start watch
        loop on events
            Etcd-->>Candidate: event(delete/update)
            Candidate->>Candidate: checkClaim()
        end
    end

    Scheduler->>Candidate: Stop(ctx)
    Candidate->>Store: Delete(StoredElection) (if leader)
    Store->>Etcd: DELETE key
    Candidate->>Candidate: stop loops, clear state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped to guard a tiny key,
With TTL heartbeats, soft and free.
I watched the watch, I claimed the crown,
When leadership fell, I twitched my gown.
Hooray — new seeds of order found! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description includes all required sections: Summary, Changes, Testing with detailed instructions, and issue reference (PLAT-347). The Checklist section is notably absent, which is a required template component. Add the Checklist section from the template with appropriate checkmarks for tests, documentation, issue linking, changelog, and breaking changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add election package' is concise, follows Conventional Commits style, and accurately summarizes the main change of introducing a new election package.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
server/internal/scheduler/provide.go (1)

16-17: Consider making the election TTL configurable.

The electionTTL is hardcoded to 30 seconds which is a reasonable default. For production flexibility, this could be sourced from config.Config to allow tuning based on cluster characteristics without code changes.


📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9238c9f and 295b6d9.

📒 Files selected for processing (8)
  • server/cmd/root.go
  • server/internal/election/candidate.go
  • server/internal/election/candidate_test.go
  • server/internal/election/provide.go
  • server/internal/election/service.go
  • server/internal/election/store.go
  • server/internal/scheduler/elector.go
  • server/internal/scheduler/provide.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • server/internal/election/provide.go
  • server/internal/election/candidate_test.go
  • server/internal/election/service.go
  • server/internal/election/candidate.go
🧰 Additional context used
📓 Path-based instructions (1)
server/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

server/**/*.go: Use samber/do injector for dependency injection; each package should have a Provide() function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa

Files:

  • server/cmd/root.go
  • server/internal/scheduler/provide.go
  • server/internal/scheduler/elector.go
  • server/internal/election/store.go
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use `samber/do` injector for dependency injection; each package should have a `Provide()` function that registers dependencies

Applied to files:

  • server/cmd/root.go
  • server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/config/**/*.go : Multi-source configuration should follow precedence: CLI flags > environment variables > config file, with environment variables prefixed with `CONTROL_PLANE_`

Applied to files:

  • server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode

Applied to files:

  • server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflows represent long-running operations and should persist state to etcd for durability and resumability using the `cschleiden/go-workflows` framework

Applied to files:

  • server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking

Applied to files:

  • server/internal/election/store.go
🧬 Code graph analysis (2)
server/cmd/root.go (1)
server/internal/election/provide.go (1)
  • Provide (10-13)
server/internal/scheduler/provide.go (4)
server/internal/election/store.go (1)
  • Name (10-10)
server/internal/scheduler/elector.go (2)
  • Elector (18-20)
  • NewElector (22-26)
server/internal/scheduler/service.go (1)
  • Service (17-28)
server/internal/election/candidate.go (1)
  • NewCandidate (43-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (6)
server/cmd/root.go (1)

7-7: LGTM! Election package integration follows established DI patterns.

The election package is correctly wired into the initialization sequence using samber/do. Since the injector uses lazy instantiation, the placement before etcd.Provide is fine - the actual dependencies will resolve correctly when invoked. Based on coding guidelines, this follows the Provide() function pattern for registering dependencies.

Also applies to: 66-66

server/internal/scheduler/provide.go (1)

26-39: LGTM! Clean refactoring to use centralized election service.

The Elector provider now correctly obtains the election service via DI and creates a candidate with the scheduler-specific election name and host ID. This aligns well with the PR's goal of extracting election logic into a reusable package. Based on coding guidelines, this follows the samber/do injector pattern correctly.

server/internal/scheduler/elector.go (2)

18-26: LGTM! Clean delegation to election.Candidate.

The Elector is now a thin adapter that delegates all leadership logic to the centralized election.Candidate. This simplifies the scheduler package while maintaining the same interface contracts (gocron.Elector, do.Shutdownable).


40-45: Verify the timeout ratio for Shutdown is appropriate.

The shutdown timeout is set to electionTTL/3 (10 seconds with the current 30s TTL). This ensures the candidate has time to release leadership gracefully before the lease expires. If the election package's Stop method performs network calls to etcd, confirm this timeout is sufficient under degraded network conditions.

server/internal/election/store.go (2)

1-62: LGTM! Well-structured etcd-backed election store.

The ElectionStore implementation follows the etcd storage patterns from the codebase learnings, providing versioned values and watch support. The CRUD operations consistently use the Key method for path computation, and the Watch operation enables reactive leadership observation. The StoredElection type appropriately captures election state with leader identification and timestamps.


35-37: No action required. The Key method correctly constructs etcd keys using path.Join, which normalizes path separators and handles leading/trailing slashes appropriately. The implementation is consistent with other stores in the codebase and produces well-formed keys without double slashes.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@server/internal/election/candidate.go`:
- Around line 96-99: The goroutine sending errors to c.errCh (e.g., inside the
ticker handling where c.lockAndCheckClaim is called) can block or drop errors
because errCh has capacity 1; change the send to a non-blocking send using
select { case c.errCh <- err: default: } (or increase the channel buffer if you
prefer) so the ticker goroutine never blocks; apply the same non-blocking send
pattern to the other error sends referenced in this file (the sends around the
other calls that currently write to c.errCh).
🧹 Nitpick comments (2)
server/internal/election/candidate_test.go (1)

89-91: Consider adding a timeout to prevent test hangs.

The channel read <-bElected blocks indefinitely. If the callback never fires due to a bug, the test will hang instead of failing with a clear error.

♻️ Suggested improvement
 		// Block until B has claimed leadership
-		<-bElected
+		select {
+		case <-bElected:
+			// B claimed leadership
+		case <-time.After(10 * time.Second):
+			t.Fatal("timed out waiting for candidate B to claim leadership")
+		}
 		assert.True(t, candidateB.IsLeader())
server/internal/scheduler/elector.go (1)

41-46: Consider making the shutdown timeout configurable or deriving it from TTL.

The hardcoded 10-second timeout works but could be fragile if the election TTL changes. Since the AI summary mentions electionTTL = 30s in the scheduler's provide.go, this 10s is reasonable (TTL/3), but coupling it to the actual TTL would be more robust.

♻️ Optional: Accept timeout as a parameter or derive from candidate
+const shutdownTimeout = 10 * time.Second
+
 func (e *Elector) Shutdown() error {
-	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+	ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout)
 	defer cancel()

 	return e.candidate.Stop(ctx)
 }

Alternatively, if the Candidate exposes its TTL, derive the timeout from it.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b32c835 and 448017e.

📒 Files selected for processing (8)
  • server/cmd/root.go
  • server/internal/election/candidate.go
  • server/internal/election/candidate_test.go
  • server/internal/election/provide.go
  • server/internal/election/service.go
  • server/internal/election/store.go
  • server/internal/scheduler/elector.go
  • server/internal/scheduler/provide.go
🧰 Additional context used
📓 Path-based instructions (1)
server/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

server/**/*.go: Use samber/do injector for dependency injection; each package should have a Provide() function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa

Files:

  • server/internal/election/provide.go
  • server/internal/scheduler/provide.go
  • server/internal/election/candidate_test.go
  • server/cmd/root.go
  • server/internal/scheduler/elector.go
  • server/internal/election/service.go
  • server/internal/election/store.go
  • server/internal/election/candidate.go
🧠 Learnings (7)
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use `samber/do` injector for dependency injection; each package should have a `Provide()` function that registers dependencies

Applied to files:

  • server/internal/election/provide.go
  • server/internal/scheduler/provide.go
  • server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflows represent long-running operations and should persist state to etcd for durability and resumability using the `cschleiden/go-workflows` framework

Applied to files:

  • server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to e2e/**/*.go : E2E tests should use build tag `//go:build e2e_test` and place test fixtures in `e2e/fixtures/`

Applied to files:

  • server/internal/election/candidate_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to clustertest/**/*.go : Cluster integration tests should use build tag `//go:build cluster_test` and use testcontainers for Docker-based test environments

Applied to files:

  • server/internal/election/candidate_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/config/**/*.go : Multi-source configuration should follow precedence: CLI flags > environment variables > config file, with environment variables prefixed with `CONTROL_PLANE_`

Applied to files:

  • server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode

Applied to files:

  • server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking

Applied to files:

  • server/internal/election/store.go
🧬 Code graph analysis (7)
server/internal/election/provide.go (2)
server/internal/election/store.go (2)
  • ElectionStore (23-26)
  • NewElectionStore (28-33)
server/internal/election/service.go (2)
  • Service (10-13)
  • NewService (16-24)
server/internal/scheduler/provide.go (4)
server/internal/election/store.go (1)
  • Name (10-10)
server/internal/scheduler/elector.go (2)
  • Elector (19-21)
  • NewElector (23-27)
server/internal/scheduler/service.go (1)
  • Service (17-28)
server/internal/election/candidate.go (1)
  • NewCandidate (43-64)
server/internal/election/candidate_test.go (4)
server/internal/testutils/logger.go (1)
  • Logger (9-17)
server/internal/election/store.go (2)
  • NewElectionStore (28-33)
  • Name (10-10)
server/internal/election/service.go (1)
  • NewService (16-24)
server/internal/election/candidate.go (1)
  • NewCandidate (43-64)
server/cmd/root.go (1)
server/internal/election/provide.go (1)
  • Provide (10-13)
server/internal/scheduler/elector.go (1)
server/internal/election/candidate.go (1)
  • Candidate (22-36)
server/internal/election/service.go (3)
server/internal/election/store.go (2)
  • ElectionStore (23-26)
  • Name (10-10)
server/internal/testutils/logger.go (1)
  • Logger (9-17)
server/internal/election/candidate.go (3)
  • NewCandidate (43-64)
  • ClaimHandler (17-17)
  • Candidate (22-36)
server/internal/election/candidate.go (3)
server/internal/election/store.go (2)
  • Name (10-10)
  • StoredElection (16-21)
server/internal/storage/interface.go (1)
  • WatchOp (111-122)
server/internal/storage/errors.go (3)
  • ErrValueVersionMismatch (14-14)
  • ErrAlreadyExists (10-10)
  • ErrWatchClosed (34-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (14)
server/cmd/root.go (1)

66-67: LGTM!

The election package is properly wired into the dependency injection chain. Since samber/do uses lazy evaluation, the placement before etcd.Provide(i) is fine—the etcd client will be resolved when election.Service is first invoked. Based on learnings, this follows the expected pattern where each package has a Provide() function for DI registration.

server/internal/election/candidate_test.go (2)

16-62: LGTM!

The basic functionality test thoroughly covers the candidate lifecycle: idempotent Start/Stop, leadership retention across TTL refresh, and restart capability. Good use of require for operations that must succeed and assert for state verification.


64-95: Good coverage of multi-candidate leadership transitions.

The test properly verifies that leadership transfers when the current leader stops, using the ClaimHandler callback to synchronize the assertion.

server/internal/election/provide.go (1)

10-41: LGTM!

The DI wiring follows the established pattern with a public Provide() function and properly separates the ElectionStore and Service providers. Error propagation from dependency resolution is handled correctly. Based on learnings, this aligns with the samber/do usage pattern expected in the codebase.

server/internal/election/service.go (1)

9-29: LGTM!

Clean service facade that encapsulates the store and logger dependencies, providing a convenient factory method for creating candidates. The variadic onClaim parameter provides flexibility for callers while the underlying NewCandidate receives the slice correctly.

server/internal/scheduler/provide.go (2)

16-17: LGTM!

The constants appropriately encapsulate the scheduler's election configuration. The 30-second TTL provides a reasonable balance between leader detection latency and etcd load from lease renewals.


26-40: Clean integration with the new election package.

The provideElector function properly delegates to election.Service for candidate creation, using cfg.HostID as the unique candidate identifier. This simplifies the Elector to a thin wrapper around election.Candidate.

server/internal/election/store.go (1)

1-62: LGTM! Clean storage abstraction following established patterns.

The ElectionStore implementation correctly follows the repository's storage patterns with:

  • Proper key generation using storage.Prefix
  • Consistent CRUD operations returning typed operation objects
  • Watch support for event-driven leadership changes
  • Embedded storage.StoredValue for optimistic locking support

This aligns well with the learning to use etcd for cluster state with versioned values and watch support.

server/internal/scheduler/elector.go (1)

19-27: LGTM! Clean refactor delegating to election.Candidate.

The simplified Elector struct with single-field delegation is a good application of composition. This removes duplicated leadership logic while maintaining the same public interface.

server/internal/election/candidate.go (5)

22-36: Good struct design with appropriate synchronization primitives.

The use of atomic.Bool for isLeader and sync.Mutex for state protection is appropriate. The buffered error channel (size 1) is noted - see related comment below about potential error loss.


43-64: LGTM! Proper initialization with structured logging.

The constructor follows zerolog best practices by enriching the logger with component context (election_candidate, election_name, candidate_id). This aligns with the coding guidelines for structured JSON logging.


206-210: Context passed to onClaim handlers may already be cancelled.

When claim handlers are invoked with ctx, this is the same context from Start(). If the parent context has a deadline or is cancelled after attemptClaim succeeds but before handlers complete, they'll receive a cancelled context. Consider whether handlers should receive a fresh context or a context derived from a different parent.

This may be intentional if handlers should respect the parent's lifecycle, but worth confirming the expected behavior.


249-286: LGTM! Robust release logic with proper edge-case handling.

The release method correctly handles:

  • Early return if not leader
  • Claim expired and unclaimed (ErrNotFound)
  • Claim taken by another candidate (leader ID mismatch)
  • Version mismatch during delete (optimistic locking)

The timeout derived from TTL is a reasonable choice.


124-141: No blocking issue exists.

The done channel is properly buffered with size 1 (line 84), and Stop() is idempotent due to the running guard (line 128-130) that prevents multiple sends. The single buffered send at line 137 will not block regardless of goroutine state.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@jason-lynch jason-lynch force-pushed the feat/PLAT-347/election-pkg branch from 448017e to 847d42a Compare January 15, 2026 13:50
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@server/internal/election/candidate.go`:
- Around line 163-195: In the checkClaim function, correct the typo in the
ErrValueVersionMismatch comment inside the retry switch: replace "The continue
wil operation." with a clear phrase such as "The continue will re-attempt the
operation; when we re-fetch the claim it will either not exist or belong to
someone else." Update the comment near the ErrValueVersionMismatch case in
checkClaim to reflect this exact wording.

In `@server/internal/election/store.go`:
- Around line 35-37: The ElectionStore.Key method currently uses storage.Prefix
which appends a trailing slash and produces keys like "/root/elections/foo/",
causing etcd ops to target wrong paths; change ElectionStore.Key to use
storage.Key (e.g., call storage.Key("/", s.root, "elections", name.String())) so
it returns a canonical key without the trailing slash; update any callers
assuming the old format if necessary and run unit tests that exercise
get/put/delete/watch on ElectionStore to verify correct behavior.
🧹 Nitpick comments (1)
server/internal/election/candidate_test.go (1)

39-42: Consider using a shorter TTL or polling approach to reduce test duration.

The 2-second sleep waiting for TTL refresh works but adds latency to test runs. Since the TTL is set to 1 second on line 26, this is appropriately >1x the TTL, but consider whether a polling loop with a shorter interval and timeout would be more resilient in slow CI environments.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 448017e and 847d42a.

📒 Files selected for processing (8)
  • server/cmd/root.go
  • server/internal/election/candidate.go
  • server/internal/election/candidate_test.go
  • server/internal/election/provide.go
  • server/internal/election/service.go
  • server/internal/election/store.go
  • server/internal/scheduler/elector.go
  • server/internal/scheduler/provide.go
🧰 Additional context used
📓 Path-based instructions (1)
server/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

server/**/*.go: Use samber/do injector for dependency injection; each package should have a Provide() function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa

Files:

  • server/internal/election/service.go
  • server/internal/scheduler/elector.go
  • server/internal/election/candidate_test.go
  • server/internal/election/provide.go
  • server/internal/scheduler/provide.go
  • server/internal/election/store.go
  • server/cmd/root.go
  • server/internal/election/candidate.go
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to e2e/**/*.go : E2E tests should use build tag `//go:build e2e_test` and place test fixtures in `e2e/fixtures/`

Applied to files:

  • server/internal/election/candidate_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use `samber/do` injector for dependency injection; each package should have a `Provide()` function that registers dependencies

Applied to files:

  • server/internal/election/provide.go
  • server/internal/scheduler/provide.go
  • server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflows represent long-running operations and should persist state to etcd for durability and resumability using the `cschleiden/go-workflows` framework

Applied to files:

  • server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking

Applied to files:

  • server/internal/election/store.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/config/**/*.go : Multi-source configuration should follow precedence: CLI flags > environment variables > config file, with environment variables prefixed with `CONTROL_PLANE_`

Applied to files:

  • server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode

Applied to files:

  • server/cmd/root.go
📚 Learning: 2026-01-14T22:26:37.628Z
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 246
File: server/internal/election/candidate.go:220-247
Timestamp: 2026-01-14T22:26:37.628Z
Learning: In the control-plane codebase, the `storage.WatchOp.Watch()` method runs the provided handler function in a separate goroutine, so recursive calls within the handler do not create deadlock risks with locks held by the caller.

Applied to files:

  • server/internal/election/candidate.go
🧬 Code graph analysis (6)
server/internal/election/service.go (3)
server/internal/election/store.go (2)
  • ElectionStore (23-26)
  • Name (10-10)
server/internal/testutils/logger.go (1)
  • Logger (9-17)
server/internal/election/candidate.go (3)
  • NewCandidate (43-64)
  • ClaimHandler (17-17)
  • Candidate (22-36)
server/internal/scheduler/elector.go (3)
server/internal/election/candidate.go (1)
  • Candidate (22-36)
server/internal/resource/resource.go (1)
  • Context (84-88)
server/internal/utils/utils.go (1)
  • WithTimeout (21-36)
server/internal/election/provide.go (3)
server/internal/election/store.go (2)
  • ElectionStore (23-26)
  • NewElectionStore (28-33)
server/internal/election/service.go (2)
  • Service (10-13)
  • NewService (16-24)
server/internal/testutils/logger.go (1)
  • Logger (9-17)
server/internal/scheduler/provide.go (4)
server/internal/election/store.go (1)
  • Name (10-10)
server/internal/scheduler/elector.go (2)
  • Elector (18-20)
  • NewElector (22-26)
server/internal/scheduler/service.go (1)
  • Service (17-28)
server/internal/election/candidate.go (1)
  • NewCandidate (43-64)
server/internal/election/store.go (6)
server/internal/storage/interface.go (5)
  • StoredValue (21-23)
  • GetOp (50-52)
  • PutOp (65-71)
  • DeleteValueOp (83-86)
  • WatchOp (111-122)
server/internal/storage/key.go (2)
  • Key (19-25)
  • Prefix (10-16)
server/internal/storage/get.go (1)
  • NewGetOp (23-29)
server/internal/storage/put.go (2)
  • NewCreateOp (69-76)
  • NewUpdateOp (121-128)
server/internal/storage/delete.go (1)
  • NewDeleteValueOp (91-98)
server/internal/storage/watch.go (1)
  • NewWatchOp (23-29)
server/cmd/root.go (7)
server/internal/scheduler/provide.go (1)
  • Provide (19-24)
server/internal/election/provide.go (1)
  • Provide (10-13)
server/internal/monitor/provide.go (1)
  • Provide (14-17)
server/internal/cluster/provide.go (1)
  • Provide (10-13)
server/internal/config/provide.go (1)
  • Provide (9-12)
server/internal/api/provide.go (1)
  • Provide (13-16)
server/internal/task/provide.go (1)
  • Provide (9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (19)
server/cmd/root.go (1)

7-7: LGTM!

The election package import and election.Provide(i) call follow the established DI wiring pattern used by other packages in this file. The alphabetical ordering is maintained correctly.

Also applies to: 66-66

server/internal/election/provide.go (1)

1-41: LGTM!

The DI wiring follows the established pattern in the codebase with proper error propagation. The Provide() function correctly registers both the ElectionStore and Service providers, matching the signatures defined in store.go and service.go. As per coding guidelines, this uses samber/do for dependency injection.

server/internal/election/candidate_test.go (1)

64-100: Well-structured leadership transfer test.

The multiple candidates test properly validates:

  • Initial leadership acquisition by candidate A
  • Leadership transfer to candidate B after A stops
  • Callback notification via buffered channel
  • Timeout protection prevents test hanging

The use of context.Background() in cleanup is correct since the test context may already be cancelled.

server/internal/election/service.go (1)

1-30: LGTM!

The Service provides a clean abstraction for candidate creation with proper DI wiring. The variadic onClaim ...ClaimHandler parameter provides a convenient API while correctly passing the slice to the underlying NewCandidate constructor. The documentation noting that candidateID must be unique is helpful.

server/internal/scheduler/provide.go (2)

16-17: LGTM - well-defined election constants.

The electionName and electionTTL constants are appropriately scoped to the scheduler package. A 30-second TTL is reasonable for distributed leader election, providing a balance between responsiveness to failures and reduced etcd churn.


26-40: LGTM - clean delegation to election package.

The refactored provideElector properly leverages the new election.Service for candidate creation, cleanly separating the scheduler's election concerns from the underlying election mechanics. This aligns with the PR's goal of extracting election logic into a standalone package.

server/internal/election/store.go (2)

10-33: LGTM!

The Name type, StoredElection struct with embedded StoredValue for versioned storage, and ElectionStore constructor follow the established etcd storage patterns with proper type-safe generics.


39-62: LGTM!

The CRUD and Watch operations correctly delegate to the storage package's operation builders, matching the interfaces shown in the relevant snippets.

server/internal/scheduler/elector.go (3)

18-26: LGTM!

Clean delegation pattern. The Elector is now a lightweight adapter that wraps election.Candidate, reducing complexity while maintaining the gocron.Elector interface contract.


28-38: LGTM!

Start and IsLeader correctly delegate to the candidate with appropriate error mapping via ErrNonLeader.


40-49: electionTTL constant is properly defined in the scheduler package.

The constant is defined in server/internal/scheduler/provide.go:17 as const electionTTL time.Duration = 30 * time.Second, and the shutdown timeout of electionTTL/3 (10 seconds) is a reasonable approach for graceful election cleanup.

server/internal/election/candidate.go (8)

15-36: LGTM!

The ClaimHandler type and Candidate struct are well-designed with appropriate concurrency primitives (sync.Mutex for thread safety, atomic.Bool for lock-free leadership checks). The structured logging with zerolog follows the coding guidelines.


38-64: LGTM!

Constructor properly enriches the logger with component, election_name, and candidate_id fields for observability. Deferring done and ticker initialization to Start() avoids race conditions as noted in the code comments.


66-115: LGTM!

The Start method is idempotent and properly initializes the ticker at ttl/3 for leadership renewal. The design captures done and ticker in the closure to avoid data races, as documented in the comment.


143-154: LGTM!

Error() returns a receive-only channel, and AddHandlers is properly mutex-protected for thread-safe handler registration.


197-222: LGTM!

attemptClaim properly handles the optimistic concurrency pattern: on success it invokes handlers asynchronously, on ErrAlreadyExists it gracefully yields to the existing leader, and other errors are propagated.


224-251: LGTM!

The watch implementation correctly handles leadership transitions on delete events and restarts on ErrWatchClosed. Based on learnings, the WatchOp.Watch() handler runs in a separate goroutine, so the recursive c.watch(ctx) call is safe.


253-290: LGTM!

The release method comprehensively handles edge cases: expired claims, leadership takeover by another candidate, and version mismatches during delete. The optimistic concurrency pattern is correctly applied.


117-141: Minor: c.done <- struct{}{} may block if the ticker goroutine has already exited.

If the context passed to Start() is cancelled, the ticker goroutine exits via case <-ctx.Done(). A subsequent Stop() call will attempt c.done <- struct{}{} but with no receiver, this send blocks indefinitely (channel buffer is 1, but the goroutine is gone).

Consider using a non-blocking send or checking c.running state more carefully to handle this edge case.

♻️ Suggested non-blocking send
-	c.done <- struct{}{}
+	select {
+	case c.done <- struct{}{}:
+	default:
+		// Goroutine already exited (e.g., context cancelled)
+	}

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@jason-lynch jason-lynch force-pushed the feat/PLAT-347/election-pkg branch from 847d42a to 9238c9f Compare January 15, 2026 13:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@server/internal/election/candidate.go`:
- Around line 253-290: Update the typo in the comment inside the
Candidate.release method: change "noone" to "no one" in the comment that
explains the claim expired and no one else has claimed it (located in the switch
case handling errors.Is(err, storage.ErrNotFound) in release).
🧹 Nitpick comments (1)
server/internal/election/candidate_test.go (1)

39-42: Consider using a shorter TTL or event-based synchronization.

The 2-second sleep works but makes tests slower. Since the candidate is using a 1-second TTL, this is technically correct, but if the implementation exposes any refresh event or leadership confirmation callback, that would be preferable to time-based waiting.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 847d42a and 9238c9f.

📒 Files selected for processing (8)
  • server/cmd/root.go
  • server/internal/election/candidate.go
  • server/internal/election/candidate_test.go
  • server/internal/election/provide.go
  • server/internal/election/service.go
  • server/internal/election/store.go
  • server/internal/scheduler/elector.go
  • server/internal/scheduler/provide.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/cmd/root.go
🧰 Additional context used
📓 Path-based instructions (1)
server/**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

server/**/*.go: Use samber/do injector for dependency injection; each package should have a Provide() function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa

Files:

  • server/internal/election/provide.go
  • server/internal/election/candidate_test.go
  • server/internal/election/store.go
  • server/internal/scheduler/elector.go
  • server/internal/scheduler/provide.go
  • server/internal/election/candidate.go
  • server/internal/election/service.go
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use `samber/do` injector for dependency injection; each package should have a `Provide()` function that registers dependencies

Applied to files:

  • server/internal/election/provide.go
  • server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to e2e/**/*.go : E2E tests should use build tag `//go:build e2e_test` and place test fixtures in `e2e/fixtures/`

Applied to files:

  • server/internal/election/candidate_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking

Applied to files:

  • server/internal/election/store.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflows represent long-running operations and should persist state to etcd for durability and resumability using the `cschleiden/go-workflows` framework

Applied to files:

  • server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T22:26:37.628Z
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 246
File: server/internal/election/candidate.go:220-247
Timestamp: 2026-01-14T22:26:37.628Z
Learning: In the control-plane codebase, the `storage.WatchOp.Watch()` method runs the provided handler function in a separate goroutine, so recursive calls within the handler do not create deadlock risks with locks held by the caller.

Applied to files:

  • server/internal/election/candidate.go
🧬 Code graph analysis (7)
server/internal/election/provide.go (3)
server/internal/election/store.go (2)
  • ElectionStore (23-26)
  • NewElectionStore (28-33)
server/internal/election/service.go (2)
  • Service (10-13)
  • NewService (16-24)
server/internal/testutils/logger.go (1)
  • Logger (9-17)
server/internal/election/candidate_test.go (4)
server/internal/testutils/logger.go (1)
  • Logger (9-17)
server/internal/election/store.go (2)
  • NewElectionStore (28-33)
  • Name (10-10)
server/internal/election/service.go (1)
  • NewService (16-24)
server/internal/election/candidate.go (1)
  • NewCandidate (43-64)
server/internal/election/store.go (6)
server/internal/storage/interface.go (5)
  • StoredValue (21-23)
  • GetOp (50-52)
  • PutOp (65-71)
  • DeleteValueOp (83-86)
  • WatchOp (111-122)
server/internal/storage/key.go (1)
  • Key (19-25)
server/internal/storage/get.go (1)
  • NewGetOp (23-29)
server/internal/storage/put.go (2)
  • NewCreateOp (69-76)
  • NewUpdateOp (121-128)
server/internal/storage/delete.go (1)
  • NewDeleteValueOp (91-98)
server/internal/storage/watch.go (1)
  • NewWatchOp (23-29)
server/internal/scheduler/elector.go (1)
server/internal/election/candidate.go (1)
  • Candidate (22-36)
server/internal/scheduler/provide.go (4)
server/internal/election/store.go (1)
  • Name (10-10)
server/internal/scheduler/elector.go (2)
  • Elector (18-20)
  • NewElector (22-26)
server/internal/scheduler/service.go (1)
  • Service (17-28)
server/internal/election/candidate.go (1)
  • NewCandidate (43-64)
server/internal/election/candidate.go (3)
server/internal/election/store.go (3)
  • ElectionStore (23-26)
  • Name (10-10)
  • StoredElection (16-21)
server/internal/storage/interface.go (2)
  • WatchOp (111-122)
  • EventTypeError (93-93)
server/internal/storage/errors.go (3)
  • ErrValueVersionMismatch (14-14)
  • ErrAlreadyExists (10-10)
  • ErrWatchClosed (34-34)
server/internal/election/service.go (2)
server/internal/election/store.go (2)
  • ElectionStore (23-26)
  • Name (10-10)
server/internal/election/candidate.go (3)
  • NewCandidate (43-64)
  • ClaimHandler (17-17)
  • Candidate (22-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (19)
server/internal/election/provide.go (1)

10-41: LGTM! Clean DI wiring following project conventions.

The Provide() function correctly registers dependencies using samber/do, with proper error propagation from do.Invoke calls. The provider functions align with the coding guidelines requiring each package to have a Provide() function for dependency registration.

server/internal/election/service.go (1)

9-30: LGTM! Clean service abstraction for candidate creation.

The Service struct provides a clean factory pattern for creating Candidate instances. The delegation to the package-level NewCandidate maintains separation of concerns while allowing the service to inject the store and logger dependencies.

server/internal/scheduler/provide.go (2)

16-17: Constants are well-defined for the scheduler's election participation.

The electionName and electionTTL constants are appropriately scoped to the package. The 30-second TTL provides a reasonable balance between leader health detection and avoiding unnecessary leadership churn.


26-40: LGTM! Clean refactor to use election service.

The elector wiring now correctly delegates to the election.Service for candidate creation, which aligns with the PR's goal of centralizing election logic in the election package. The use of cfg.HostID as the candidate identifier ensures uniqueness across hosts.

server/internal/election/candidate_test.go (2)

16-22: Good test setup with isolated etcd state.

Using uuid.NewString() for the store root ensures test isolation. The test correctly sets up the service layer and uses the external test package pattern for black-box testing.


64-100: Well-structured multi-candidate test with proper timeout handling.

The test correctly validates leadership transfer using a callback channel with timeout. The buffered channel prevents goroutine leaks, and the 10-second timeout provides adequate margin for CI environments while still catching stuck tests.

server/internal/election/store.go (4)

10-14: LGTM! Clean type definition for election names.

The Name type provides type safety for election names and implements String() for compatibility with logging and key construction.


16-21: Well-structured election state entity.

The StoredElection struct correctly embeds storage.StoredValue for version tracking (enabling optimistic locking per the learnings) and includes the essential fields for leader election state.


35-37: Fix from previous review applied correctly.

The Key method now uses storage.Key() instead of storage.Prefix(), which produces the correct key path without a trailing slash. Based on the relevant snippet, storage.Key joins path elements and ensures a leading slash, which is the expected behavior for etcd key operations.


39-62: LGTM! Complete CRUD and watch operations.

The store provides a clean abstraction over the storage layer operations. Each method correctly constructs the key and delegates to the appropriate storage operation builder. This aligns with the learnings about using etcd with watch support.

server/internal/scheduler/elector.go (2)

18-26: Clean refactor to delegate election logic.

The Elector now serves as a thin wrapper around election.Candidate, properly implementing both gocron.Elector and do.Shutdownable interfaces through delegation. This separation of concerns improves testability and keeps election logic centralized.


40-45: Shutdown timeout aligns with ticker interval.

Using electionTTL/3 for the shutdown context timeout is consistent with the ticker interval in Candidate.Start(), giving adequate time to release leadership cleanly.

server/internal/election/candidate.go (7)

38-64: Well-structured constructor with contextual logging.

The constructor properly initializes the Candidate with component-specific logging context (election_candidate, election_name, candidate_id), following the coding guidelines for structured logging with zerolog. The buffered error channel (size 1) is appropriate since any error triggers shutdown.


69-115: Robust lifecycle management with defensive local variable capture.

The Start method correctly:

  • Uses idempotent guard to prevent double-starts
  • Captures done and ticker in local variables (lines 84-85) to avoid data races if Start is called again
  • Uses TTL/3 ticker interval which is a well-established pattern for leader election refresh

122-141: LGTM!

Stop correctly implements idempotent shutdown with proper cleanup sequence: close watch, signal ticker goroutine, release leadership.


163-195: Well-designed retry logic for claim management.

The checkClaim method properly handles race conditions:

  • ErrNotFound triggers new claim attempt
  • ErrValueVersionMismatch retries (claim expired between fetch and update)
  • Bounded retries prevent infinite loops

The typo from the previous review has been fixed.


197-222: Claim handlers inherit the caller's context.

The handlers at lines 212-214 receive the same ctx passed to attemptClaim. If this context gets cancelled (e.g., the Start context), handlers may terminate prematurely. Consider whether handlers should receive a separate context derived from context.Background() with appropriate timeout, or if this is intentional behavior.

If handlers should complete regardless of the election context lifecycle, consider:

♻️ Suggested change for independent handler context
 	case err == nil:
 		c.isLeader.Store(true)
 		c.logger.Debug().Msg("successfully claimed leadership")
 		for _, handler := range c.onClaim {
-			go handler(ctx)
+			go handler(context.Background())
 		}

224-251: Watch restart logic is safe per storage.WatchOp design.

The recursive call to c.watch(ctx) on line 244 is safe because, as per the learnings, storage.WatchOp.Watch() runs the handler in a separate goroutine, so the mutex is released before the recursive call attempts to acquire it.


149-154: LGTM!

Thread-safe handler registration with proper mutex protection.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Adds an election package that expands on the core logic from our custom
scheduler elector implementation. Callers can use the election service
to create a candidate, which has functions to participate in a named
election.

PLAT-347
Refactors the `scheduler` package's `Elector` to use the new `election`
package.

PLAT-347
@jason-lynch jason-lynch force-pushed the feat/PLAT-347/election-pkg branch from 9238c9f to 295b6d9 Compare January 15, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants