-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add election package #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds an etcd-backed election subsystem (store, service, Candidate) with TTL-based leadership claims, DI wiring and tests, and refactors the scheduler Elector to delegate leadership responsibilities to Changes
Sequence DiagramsequenceDiagram
participant Scheduler as Scheduler/Elector
participant Service as election.Service
participant Candidate as election.Candidate
participant Store as ElectionStore
participant Etcd as etcd
Scheduler->>Service: NewCandidate(name, id, ttl, onClaim)
Service-->>Scheduler: Candidate
Scheduler->>Candidate: Start(ctx)
Candidate->>Store: GetByKey(name)
Store->>Etcd: GET key
Etcd-->>Store: record / nil
alt no leader
Candidate->>Store: Create(StoredElection) (with lease/TTL)
Store->>Etcd: PUT key
Etcd-->>Candidate: success
Candidate->>Candidate: set isLeader=true
Candidate->>Scheduler: invoke onClaim (async)
else leader exists
Candidate->>Candidate: remain non-leader
end
par Renewal Loop
loop every ttl/3
Candidate->>Store: Update(StoredElection) (if leader)
Store->>Etcd: refresh TTL
end
and Watch Loop
Candidate->>Store: Watch(name)
Store->>Etcd: start watch
loop on events
Etcd-->>Candidate: event(delete/update)
Candidate->>Candidate: checkClaim()
end
end
Scheduler->>Candidate: Stop(ctx)
Candidate->>Store: Delete(StoredElection) (if leader)
Store->>Etcd: DELETE key
Candidate->>Candidate: stop loops, clear state
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (8)
🚧 Files skipped from review as they are similar to previous changes (4)
🧰 Additional context used📓 Path-based instructions (1)server/**/*.go📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (6)📓 Common learnings📚 Learning: 2026-01-14T16:43:14.333ZApplied to files:
📚 Learning: 2026-01-14T16:43:14.333ZApplied to files:
📚 Learning: 2026-01-14T16:43:14.333ZApplied to files:
📚 Learning: 2026-01-14T16:43:14.333ZApplied to files:
📚 Learning: 2026-01-14T16:43:14.333ZApplied to files:
🧬 Code graph analysis (2)server/cmd/root.go (1)
server/internal/scheduler/provide.go (4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (6)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@server/internal/election/candidate.go`:
- Around line 96-99: The goroutine sending errors to c.errCh (e.g., inside the
ticker handling where c.lockAndCheckClaim is called) can block or drop errors
because errCh has capacity 1; change the send to a non-blocking send using
select { case c.errCh <- err: default: } (or increase the channel buffer if you
prefer) so the ticker goroutine never blocks; apply the same non-blocking send
pattern to the other error sends referenced in this file (the sends around the
other calls that currently write to c.errCh).
🧹 Nitpick comments (2)
server/internal/election/candidate_test.go (1)
89-91: Consider adding a timeout to prevent test hangs.The channel read
<-bElectedblocks indefinitely. If the callback never fires due to a bug, the test will hang instead of failing with a clear error.♻️ Suggested improvement
// Block until B has claimed leadership - <-bElected + select { + case <-bElected: + // B claimed leadership + case <-time.After(10 * time.Second): + t.Fatal("timed out waiting for candidate B to claim leadership") + } assert.True(t, candidateB.IsLeader())server/internal/scheduler/elector.go (1)
41-46: Consider making the shutdown timeout configurable or deriving it from TTL.The hardcoded 10-second timeout works but could be fragile if the election TTL changes. Since the AI summary mentions
electionTTL = 30sin the scheduler's provide.go, this 10s is reasonable (TTL/3), but coupling it to the actual TTL would be more robust.♻️ Optional: Accept timeout as a parameter or derive from candidate
+const shutdownTimeout = 10 * time.Second + func (e *Elector) Shutdown() error { - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + ctx, cancel := context.WithTimeout(context.Background(), shutdownTimeout) defer cancel() return e.candidate.Stop(ctx) }Alternatively, if the
Candidateexposes its TTL, derive the timeout from it.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
server/cmd/root.goserver/internal/election/candidate.goserver/internal/election/candidate_test.goserver/internal/election/provide.goserver/internal/election/service.goserver/internal/election/store.goserver/internal/scheduler/elector.goserver/internal/scheduler/provide.go
🧰 Additional context used
📓 Path-based instructions (1)
server/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
server/**/*.go: Usesamber/doinjector for dependency injection; each package should have aProvide()function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa
Files:
server/internal/election/provide.goserver/internal/scheduler/provide.goserver/internal/election/candidate_test.goserver/cmd/root.goserver/internal/scheduler/elector.goserver/internal/election/service.goserver/internal/election/store.goserver/internal/election/candidate.go
🧠 Learnings (7)
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use `samber/do` injector for dependency injection; each package should have a `Provide()` function that registers dependencies
Applied to files:
server/internal/election/provide.goserver/internal/scheduler/provide.goserver/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflows represent long-running operations and should persist state to etcd for durability and resumability using the `cschleiden/go-workflows` framework
Applied to files:
server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to e2e/**/*.go : E2E tests should use build tag `//go:build e2e_test` and place test fixtures in `e2e/fixtures/`
Applied to files:
server/internal/election/candidate_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to clustertest/**/*.go : Cluster integration tests should use build tag `//go:build cluster_test` and use testcontainers for Docker-based test environments
Applied to files:
server/internal/election/candidate_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/config/**/*.go : Multi-source configuration should follow precedence: CLI flags > environment variables > config file, with environment variables prefixed with `CONTROL_PLANE_`
Applied to files:
server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Applied to files:
server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
Applied to files:
server/internal/election/store.go
🧬 Code graph analysis (7)
server/internal/election/provide.go (2)
server/internal/election/store.go (2)
ElectionStore(23-26)NewElectionStore(28-33)server/internal/election/service.go (2)
Service(10-13)NewService(16-24)
server/internal/scheduler/provide.go (4)
server/internal/election/store.go (1)
Name(10-10)server/internal/scheduler/elector.go (2)
Elector(19-21)NewElector(23-27)server/internal/scheduler/service.go (1)
Service(17-28)server/internal/election/candidate.go (1)
NewCandidate(43-64)
server/internal/election/candidate_test.go (4)
server/internal/testutils/logger.go (1)
Logger(9-17)server/internal/election/store.go (2)
NewElectionStore(28-33)Name(10-10)server/internal/election/service.go (1)
NewService(16-24)server/internal/election/candidate.go (1)
NewCandidate(43-64)
server/cmd/root.go (1)
server/internal/election/provide.go (1)
Provide(10-13)
server/internal/scheduler/elector.go (1)
server/internal/election/candidate.go (1)
Candidate(22-36)
server/internal/election/service.go (3)
server/internal/election/store.go (2)
ElectionStore(23-26)Name(10-10)server/internal/testutils/logger.go (1)
Logger(9-17)server/internal/election/candidate.go (3)
NewCandidate(43-64)ClaimHandler(17-17)Candidate(22-36)
server/internal/election/candidate.go (3)
server/internal/election/store.go (2)
Name(10-10)StoredElection(16-21)server/internal/storage/interface.go (1)
WatchOp(111-122)server/internal/storage/errors.go (3)
ErrValueVersionMismatch(14-14)ErrAlreadyExists(10-10)ErrWatchClosed(34-34)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (14)
server/cmd/root.go (1)
66-67: LGTM!The election package is properly wired into the dependency injection chain. Since
samber/douses lazy evaluation, the placement beforeetcd.Provide(i)is fine—the etcd client will be resolved whenelection.Serviceis first invoked. Based on learnings, this follows the expected pattern where each package has aProvide()function for DI registration.server/internal/election/candidate_test.go (2)
16-62: LGTM!The basic functionality test thoroughly covers the candidate lifecycle: idempotent Start/Stop, leadership retention across TTL refresh, and restart capability. Good use of
requirefor operations that must succeed andassertfor state verification.
64-95: Good coverage of multi-candidate leadership transitions.The test properly verifies that leadership transfers when the current leader stops, using the
ClaimHandlercallback to synchronize the assertion.server/internal/election/provide.go (1)
10-41: LGTM!The DI wiring follows the established pattern with a public
Provide()function and properly separates theElectionStoreandServiceproviders. Error propagation from dependency resolution is handled correctly. Based on learnings, this aligns with thesamber/dousage pattern expected in the codebase.server/internal/election/service.go (1)
9-29: LGTM!Clean service facade that encapsulates the store and logger dependencies, providing a convenient factory method for creating candidates. The variadic
onClaimparameter provides flexibility for callers while the underlyingNewCandidatereceives the slice correctly.server/internal/scheduler/provide.go (2)
16-17: LGTM!The constants appropriately encapsulate the scheduler's election configuration. The 30-second TTL provides a reasonable balance between leader detection latency and etcd load from lease renewals.
26-40: Clean integration with the new election package.The
provideElectorfunction properly delegates toelection.Servicefor candidate creation, usingcfg.HostIDas the unique candidate identifier. This simplifies theElectorto a thin wrapper aroundelection.Candidate.server/internal/election/store.go (1)
1-62: LGTM! Clean storage abstraction following established patterns.The
ElectionStoreimplementation correctly follows the repository's storage patterns with:
- Proper key generation using
storage.Prefix- Consistent CRUD operations returning typed operation objects
- Watch support for event-driven leadership changes
- Embedded
storage.StoredValuefor optimistic locking supportThis aligns well with the learning to use etcd for cluster state with versioned values and watch support.
server/internal/scheduler/elector.go (1)
19-27: LGTM! Clean refactor delegating to election.Candidate.The simplified
Electorstruct with single-field delegation is a good application of composition. This removes duplicated leadership logic while maintaining the same public interface.server/internal/election/candidate.go (5)
22-36: Good struct design with appropriate synchronization primitives.The use of
atomic.BoolforisLeaderandsync.Mutexfor state protection is appropriate. The buffered error channel (size 1) is noted - see related comment below about potential error loss.
43-64: LGTM! Proper initialization with structured logging.The constructor follows zerolog best practices by enriching the logger with component context (
election_candidate,election_name,candidate_id). This aligns with the coding guidelines for structured JSON logging.
206-210: Context passed to onClaim handlers may already be cancelled.When claim handlers are invoked with
ctx, this is the same context fromStart(). If the parent context has a deadline or is cancelled afterattemptClaimsucceeds but before handlers complete, they'll receive a cancelled context. Consider whether handlers should receive a fresh context or a context derived from a different parent.This may be intentional if handlers should respect the parent's lifecycle, but worth confirming the expected behavior.
249-286: LGTM! Robust release logic with proper edge-case handling.The
releasemethod correctly handles:
- Early return if not leader
- Claim expired and unclaimed (
ErrNotFound)- Claim taken by another candidate (leader ID mismatch)
- Version mismatch during delete (optimistic locking)
The timeout derived from TTL is a reasonable choice.
124-141: No blocking issue exists.The done channel is properly buffered with size 1 (line 84), and Stop() is idempotent due to the
runningguard (line 128-130) that prevents multiple sends. The single buffered send at line 137 will not block regardless of goroutine state.Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
448017e to
847d42a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@server/internal/election/candidate.go`:
- Around line 163-195: In the checkClaim function, correct the typo in the
ErrValueVersionMismatch comment inside the retry switch: replace "The continue
wil operation." with a clear phrase such as "The continue will re-attempt the
operation; when we re-fetch the claim it will either not exist or belong to
someone else." Update the comment near the ErrValueVersionMismatch case in
checkClaim to reflect this exact wording.
In `@server/internal/election/store.go`:
- Around line 35-37: The ElectionStore.Key method currently uses storage.Prefix
which appends a trailing slash and produces keys like "/root/elections/foo/",
causing etcd ops to target wrong paths; change ElectionStore.Key to use
storage.Key (e.g., call storage.Key("/", s.root, "elections", name.String())) so
it returns a canonical key without the trailing slash; update any callers
assuming the old format if necessary and run unit tests that exercise
get/put/delete/watch on ElectionStore to verify correct behavior.
🧹 Nitpick comments (1)
server/internal/election/candidate_test.go (1)
39-42: Consider using a shorter TTL or polling approach to reduce test duration.The 2-second sleep waiting for TTL refresh works but adds latency to test runs. Since the TTL is set to 1 second on line 26, this is appropriately >1x the TTL, but consider whether a polling loop with a shorter interval and timeout would be more resilient in slow CI environments.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
server/cmd/root.goserver/internal/election/candidate.goserver/internal/election/candidate_test.goserver/internal/election/provide.goserver/internal/election/service.goserver/internal/election/store.goserver/internal/scheduler/elector.goserver/internal/scheduler/provide.go
🧰 Additional context used
📓 Path-based instructions (1)
server/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
server/**/*.go: Usesamber/doinjector for dependency injection; each package should have aProvide()function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa
Files:
server/internal/election/service.goserver/internal/scheduler/elector.goserver/internal/election/candidate_test.goserver/internal/election/provide.goserver/internal/scheduler/provide.goserver/internal/election/store.goserver/cmd/root.goserver/internal/election/candidate.go
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to e2e/**/*.go : E2E tests should use build tag `//go:build e2e_test` and place test fixtures in `e2e/fixtures/`
Applied to files:
server/internal/election/candidate_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use `samber/do` injector for dependency injection; each package should have a `Provide()` function that registers dependencies
Applied to files:
server/internal/election/provide.goserver/internal/scheduler/provide.goserver/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflows represent long-running operations and should persist state to etcd for durability and resumability using the `cschleiden/go-workflows` framework
Applied to files:
server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
Applied to files:
server/internal/election/store.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/config/**/*.go : Multi-source configuration should follow precedence: CLI flags > environment variables > config file, with environment variables prefixed with `CONTROL_PLANE_`
Applied to files:
server/cmd/root.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Applied to files:
server/cmd/root.go
📚 Learning: 2026-01-14T22:26:37.628Z
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 246
File: server/internal/election/candidate.go:220-247
Timestamp: 2026-01-14T22:26:37.628Z
Learning: In the control-plane codebase, the `storage.WatchOp.Watch()` method runs the provided handler function in a separate goroutine, so recursive calls within the handler do not create deadlock risks with locks held by the caller.
Applied to files:
server/internal/election/candidate.go
🧬 Code graph analysis (6)
server/internal/election/service.go (3)
server/internal/election/store.go (2)
ElectionStore(23-26)Name(10-10)server/internal/testutils/logger.go (1)
Logger(9-17)server/internal/election/candidate.go (3)
NewCandidate(43-64)ClaimHandler(17-17)Candidate(22-36)
server/internal/scheduler/elector.go (3)
server/internal/election/candidate.go (1)
Candidate(22-36)server/internal/resource/resource.go (1)
Context(84-88)server/internal/utils/utils.go (1)
WithTimeout(21-36)
server/internal/election/provide.go (3)
server/internal/election/store.go (2)
ElectionStore(23-26)NewElectionStore(28-33)server/internal/election/service.go (2)
Service(10-13)NewService(16-24)server/internal/testutils/logger.go (1)
Logger(9-17)
server/internal/scheduler/provide.go (4)
server/internal/election/store.go (1)
Name(10-10)server/internal/scheduler/elector.go (2)
Elector(18-20)NewElector(22-26)server/internal/scheduler/service.go (1)
Service(17-28)server/internal/election/candidate.go (1)
NewCandidate(43-64)
server/internal/election/store.go (6)
server/internal/storage/interface.go (5)
StoredValue(21-23)GetOp(50-52)PutOp(65-71)DeleteValueOp(83-86)WatchOp(111-122)server/internal/storage/key.go (2)
Key(19-25)Prefix(10-16)server/internal/storage/get.go (1)
NewGetOp(23-29)server/internal/storage/put.go (2)
NewCreateOp(69-76)NewUpdateOp(121-128)server/internal/storage/delete.go (1)
NewDeleteValueOp(91-98)server/internal/storage/watch.go (1)
NewWatchOp(23-29)
server/cmd/root.go (7)
server/internal/scheduler/provide.go (1)
Provide(19-24)server/internal/election/provide.go (1)
Provide(10-13)server/internal/monitor/provide.go (1)
Provide(14-17)server/internal/cluster/provide.go (1)
Provide(10-13)server/internal/config/provide.go (1)
Provide(9-12)server/internal/api/provide.go (1)
Provide(13-16)server/internal/task/provide.go (1)
Provide(9-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (19)
server/cmd/root.go (1)
7-7: LGTM!The election package import and
election.Provide(i)call follow the established DI wiring pattern used by other packages in this file. The alphabetical ordering is maintained correctly.Also applies to: 66-66
server/internal/election/provide.go (1)
1-41: LGTM!The DI wiring follows the established pattern in the codebase with proper error propagation. The
Provide()function correctly registers both theElectionStoreandServiceproviders, matching the signatures defined instore.goandservice.go. As per coding guidelines, this usessamber/dofor dependency injection.server/internal/election/candidate_test.go (1)
64-100: Well-structured leadership transfer test.The multiple candidates test properly validates:
- Initial leadership acquisition by candidate A
- Leadership transfer to candidate B after A stops
- Callback notification via buffered channel
- Timeout protection prevents test hanging
The use of
context.Background()in cleanup is correct since the test context may already be cancelled.server/internal/election/service.go (1)
1-30: LGTM!The Service provides a clean abstraction for candidate creation with proper DI wiring. The variadic
onClaim ...ClaimHandlerparameter provides a convenient API while correctly passing the slice to the underlyingNewCandidateconstructor. The documentation noting thatcandidateIDmust be unique is helpful.server/internal/scheduler/provide.go (2)
16-17: LGTM - well-defined election constants.The
electionNameandelectionTTLconstants are appropriately scoped to the scheduler package. A 30-second TTL is reasonable for distributed leader election, providing a balance between responsiveness to failures and reduced etcd churn.
26-40: LGTM - clean delegation to election package.The refactored
provideElectorproperly leverages the newelection.Servicefor candidate creation, cleanly separating the scheduler's election concerns from the underlying election mechanics. This aligns with the PR's goal of extracting election logic into a standalone package.server/internal/election/store.go (2)
10-33: LGTM!The
Nametype,StoredElectionstruct with embeddedStoredValuefor versioned storage, andElectionStoreconstructor follow the established etcd storage patterns with proper type-safe generics.
39-62: LGTM!The CRUD and Watch operations correctly delegate to the storage package's operation builders, matching the interfaces shown in the relevant snippets.
server/internal/scheduler/elector.go (3)
18-26: LGTM!Clean delegation pattern. The Elector is now a lightweight adapter that wraps
election.Candidate, reducing complexity while maintaining thegocron.Electorinterface contract.
28-38: LGTM!
StartandIsLeadercorrectly delegate to the candidate with appropriate error mapping viaErrNonLeader.
40-49:electionTTLconstant is properly defined in the scheduler package.The constant is defined in
server/internal/scheduler/provide.go:17asconst electionTTL time.Duration = 30 * time.Second, and the shutdown timeout ofelectionTTL/3(10 seconds) is a reasonable approach for graceful election cleanup.server/internal/election/candidate.go (8)
15-36: LGTM!The
ClaimHandlertype andCandidatestruct are well-designed with appropriate concurrency primitives (sync.Mutexfor thread safety,atomic.Boolfor lock-free leadership checks). The structured logging with zerolog follows the coding guidelines.
38-64: LGTM!Constructor properly enriches the logger with
component,election_name, andcandidate_idfields for observability. Deferringdoneandtickerinitialization toStart()avoids race conditions as noted in the code comments.
66-115: LGTM!The
Startmethod is idempotent and properly initializes the ticker atttl/3for leadership renewal. The design capturesdoneandtickerin the closure to avoid data races, as documented in the comment.
143-154: LGTM!
Error()returns a receive-only channel, andAddHandlersis properly mutex-protected for thread-safe handler registration.
197-222: LGTM!
attemptClaimproperly handles the optimistic concurrency pattern: on success it invokes handlers asynchronously, onErrAlreadyExistsit gracefully yields to the existing leader, and other errors are propagated.
224-251: LGTM!The watch implementation correctly handles leadership transitions on delete events and restarts on
ErrWatchClosed. Based on learnings, theWatchOp.Watch()handler runs in a separate goroutine, so the recursivec.watch(ctx)call is safe.
253-290: LGTM!The
releasemethod comprehensively handles edge cases: expired claims, leadership takeover by another candidate, and version mismatches during delete. The optimistic concurrency pattern is correctly applied.
117-141: Minor:c.done <- struct{}{}may block if the ticker goroutine has already exited.If the context passed to
Start()is cancelled, the ticker goroutine exits viacase <-ctx.Done(). A subsequentStop()call will attemptc.done <- struct{}{}but with no receiver, this send blocks indefinitely (channel buffer is 1, but the goroutine is gone).Consider using a non-blocking send or checking
c.runningstate more carefully to handle this edge case.♻️ Suggested non-blocking send
- c.done <- struct{}{} + select { + case c.done <- struct{}{}: + default: + // Goroutine already exited (e.g., context cancelled) + }Likely an incorrect or invalid review comment.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
847d42a to
9238c9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@server/internal/election/candidate.go`:
- Around line 253-290: Update the typo in the comment inside the
Candidate.release method: change "noone" to "no one" in the comment that
explains the claim expired and no one else has claimed it (located in the switch
case handling errors.Is(err, storage.ErrNotFound) in release).
🧹 Nitpick comments (1)
server/internal/election/candidate_test.go (1)
39-42: Consider using a shorter TTL or event-based synchronization.The 2-second sleep works but makes tests slower. Since the candidate is using a 1-second TTL, this is technically correct, but if the implementation exposes any refresh event or leadership confirmation callback, that would be preferable to time-based waiting.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
server/cmd/root.goserver/internal/election/candidate.goserver/internal/election/candidate_test.goserver/internal/election/provide.goserver/internal/election/service.goserver/internal/election/store.goserver/internal/scheduler/elector.goserver/internal/scheduler/provide.go
🚧 Files skipped from review as they are similar to previous changes (1)
- server/cmd/root.go
🧰 Additional context used
📓 Path-based instructions (1)
server/**/*.go
📄 CodeRabbit inference engine (CLAUDE.md)
server/**/*.go: Usesamber/doinjector for dependency injection; each package should have aProvide()function that registers dependencies
Use structured JSON logging with zerolog throughout the codebase, with pretty-printing enabled in dev mode
Domain-specific errors should be defined in each package; API errors should be mapped to HTTP status codes via Goa
Files:
server/internal/election/provide.goserver/internal/election/candidate_test.goserver/internal/election/store.goserver/internal/scheduler/elector.goserver/internal/scheduler/provide.goserver/internal/election/candidate.goserver/internal/election/service.go
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/**/*.go : Use `samber/do` injector for dependency injection; each package should have a `Provide()` function that registers dependencies
Applied to files:
server/internal/election/provide.goserver/internal/scheduler/provide.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to e2e/**/*.go : E2E tests should use build tag `//go:build e2e_test` and place test fixtures in `e2e/fixtures/`
Applied to files:
server/internal/election/candidate_test.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Use etcd for storing all cluster state with versioned values and watch support; implement storage layer with transaction support and optimistic locking
Applied to files:
server/internal/election/store.go
📚 Learning: 2026-01-14T16:43:14.333Z
Learnt from: CR
Repo: pgEdge/control-plane PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-14T16:43:14.333Z
Learning: Applies to server/internal/workflows/**/*.go : Workflows represent long-running operations and should persist state to etcd for durability and resumability using the `cschleiden/go-workflows` framework
Applied to files:
server/internal/scheduler/provide.go
📚 Learning: 2026-01-14T22:26:37.628Z
Learnt from: jason-lynch
Repo: pgEdge/control-plane PR: 246
File: server/internal/election/candidate.go:220-247
Timestamp: 2026-01-14T22:26:37.628Z
Learning: In the control-plane codebase, the `storage.WatchOp.Watch()` method runs the provided handler function in a separate goroutine, so recursive calls within the handler do not create deadlock risks with locks held by the caller.
Applied to files:
server/internal/election/candidate.go
🧬 Code graph analysis (7)
server/internal/election/provide.go (3)
server/internal/election/store.go (2)
ElectionStore(23-26)NewElectionStore(28-33)server/internal/election/service.go (2)
Service(10-13)NewService(16-24)server/internal/testutils/logger.go (1)
Logger(9-17)
server/internal/election/candidate_test.go (4)
server/internal/testutils/logger.go (1)
Logger(9-17)server/internal/election/store.go (2)
NewElectionStore(28-33)Name(10-10)server/internal/election/service.go (1)
NewService(16-24)server/internal/election/candidate.go (1)
NewCandidate(43-64)
server/internal/election/store.go (6)
server/internal/storage/interface.go (5)
StoredValue(21-23)GetOp(50-52)PutOp(65-71)DeleteValueOp(83-86)WatchOp(111-122)server/internal/storage/key.go (1)
Key(19-25)server/internal/storage/get.go (1)
NewGetOp(23-29)server/internal/storage/put.go (2)
NewCreateOp(69-76)NewUpdateOp(121-128)server/internal/storage/delete.go (1)
NewDeleteValueOp(91-98)server/internal/storage/watch.go (1)
NewWatchOp(23-29)
server/internal/scheduler/elector.go (1)
server/internal/election/candidate.go (1)
Candidate(22-36)
server/internal/scheduler/provide.go (4)
server/internal/election/store.go (1)
Name(10-10)server/internal/scheduler/elector.go (2)
Elector(18-20)NewElector(22-26)server/internal/scheduler/service.go (1)
Service(17-28)server/internal/election/candidate.go (1)
NewCandidate(43-64)
server/internal/election/candidate.go (3)
server/internal/election/store.go (3)
ElectionStore(23-26)Name(10-10)StoredElection(16-21)server/internal/storage/interface.go (2)
WatchOp(111-122)EventTypeError(93-93)server/internal/storage/errors.go (3)
ErrValueVersionMismatch(14-14)ErrAlreadyExists(10-10)ErrWatchClosed(34-34)
server/internal/election/service.go (2)
server/internal/election/store.go (2)
ElectionStore(23-26)Name(10-10)server/internal/election/candidate.go (3)
NewCandidate(43-64)ClaimHandler(17-17)Candidate(22-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (19)
server/internal/election/provide.go (1)
10-41: LGTM! Clean DI wiring following project conventions.The
Provide()function correctly registers dependencies usingsamber/do, with proper error propagation fromdo.Invokecalls. The provider functions align with the coding guidelines requiring each package to have aProvide()function for dependency registration.server/internal/election/service.go (1)
9-30: LGTM! Clean service abstraction for candidate creation.The
Servicestruct provides a clean factory pattern for creatingCandidateinstances. The delegation to the package-levelNewCandidatemaintains separation of concerns while allowing the service to inject the store and logger dependencies.server/internal/scheduler/provide.go (2)
16-17: Constants are well-defined for the scheduler's election participation.The
electionNameandelectionTTLconstants are appropriately scoped to the package. The 30-second TTL provides a reasonable balance between leader health detection and avoiding unnecessary leadership churn.
26-40: LGTM! Clean refactor to use election service.The elector wiring now correctly delegates to the
election.Servicefor candidate creation, which aligns with the PR's goal of centralizing election logic in the election package. The use ofcfg.HostIDas the candidate identifier ensures uniqueness across hosts.server/internal/election/candidate_test.go (2)
16-22: Good test setup with isolated etcd state.Using
uuid.NewString()for the store root ensures test isolation. The test correctly sets up the service layer and uses the external test package pattern for black-box testing.
64-100: Well-structured multi-candidate test with proper timeout handling.The test correctly validates leadership transfer using a callback channel with timeout. The buffered channel prevents goroutine leaks, and the 10-second timeout provides adequate margin for CI environments while still catching stuck tests.
server/internal/election/store.go (4)
10-14: LGTM! Clean type definition for election names.The
Nametype provides type safety for election names and implementsString()for compatibility with logging and key construction.
16-21: Well-structured election state entity.The
StoredElectionstruct correctly embedsstorage.StoredValuefor version tracking (enabling optimistic locking per the learnings) and includes the essential fields for leader election state.
35-37: Fix from previous review applied correctly.The
Keymethod now usesstorage.Key()instead ofstorage.Prefix(), which produces the correct key path without a trailing slash. Based on the relevant snippet,storage.Keyjoins path elements and ensures a leading slash, which is the expected behavior for etcd key operations.
39-62: LGTM! Complete CRUD and watch operations.The store provides a clean abstraction over the storage layer operations. Each method correctly constructs the key and delegates to the appropriate storage operation builder. This aligns with the learnings about using etcd with watch support.
server/internal/scheduler/elector.go (2)
18-26: Clean refactor to delegate election logic.The Elector now serves as a thin wrapper around
election.Candidate, properly implementing bothgocron.Electoranddo.Shutdownableinterfaces through delegation. This separation of concerns improves testability and keeps election logic centralized.
40-45: Shutdown timeout aligns with ticker interval.Using
electionTTL/3for the shutdown context timeout is consistent with the ticker interval inCandidate.Start(), giving adequate time to release leadership cleanly.server/internal/election/candidate.go (7)
38-64: Well-structured constructor with contextual logging.The constructor properly initializes the Candidate with component-specific logging context (
election_candidate,election_name,candidate_id), following the coding guidelines for structured logging with zerolog. The buffered error channel (size 1) is appropriate since any error triggers shutdown.
69-115: Robust lifecycle management with defensive local variable capture.The Start method correctly:
- Uses idempotent guard to prevent double-starts
- Captures
doneandtickerin local variables (lines 84-85) to avoid data races if Start is called again- Uses TTL/3 ticker interval which is a well-established pattern for leader election refresh
122-141: LGTM!Stop correctly implements idempotent shutdown with proper cleanup sequence: close watch, signal ticker goroutine, release leadership.
163-195: Well-designed retry logic for claim management.The checkClaim method properly handles race conditions:
ErrNotFoundtriggers new claim attemptErrValueVersionMismatchretries (claim expired between fetch and update)- Bounded retries prevent infinite loops
The typo from the previous review has been fixed.
197-222: Claim handlers inherit the caller's context.The handlers at lines 212-214 receive the same
ctxpassed toattemptClaim. If this context gets cancelled (e.g., the Start context), handlers may terminate prematurely. Consider whether handlers should receive a separate context derived fromcontext.Background()with appropriate timeout, or if this is intentional behavior.If handlers should complete regardless of the election context lifecycle, consider:
♻️ Suggested change for independent handler context
case err == nil: c.isLeader.Store(true) c.logger.Debug().Msg("successfully claimed leadership") for _, handler := range c.onClaim { - go handler(ctx) + go handler(context.Background()) }
224-251: Watch restart logic is safe per storage.WatchOp design.The recursive call to
c.watch(ctx)on line 244 is safe because, as per the learnings,storage.WatchOp.Watch()runs the handler in a separate goroutine, so the mutex is released before the recursive call attempts to acquire it.
149-154: LGTM!Thread-safe handler registration with proper mutex protection.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Adds an election package that expands on the core logic from our custom scheduler elector implementation. Callers can use the election service to create a candidate, which has functions to participate in a named election. PLAT-347
Refactors the `scheduler` package's `Elector` to use the new `election` package. PLAT-347
9238c9f to
295b6d9
Compare
Summary
Adds an election package that expands on the core logic from our custom scheduler elector implementation. Callers can use the election service to create a candidate, which has functions to participate in a named election.
Changes
Refactors the election implementation from the
schedulerpackage into a standaloneelectionpackage and adds some additional edge case checks and tests.Testing
This PR shouldn't cause any noticeable change in functionality, aside from improved handling of certain types of outages. You can inspect the election and scheduler output by enabling debug logging and watching for messages with
component=election_candidate:The scheduler is currently only used for scheduled backup jobs. Assuming you're using the
dev-env.zshhelpers, you can test this with a database like:Then, you should see the backup running every 5 minutes in the control-plane server logs, and it should only run on the host that won the election. Note that you will see
registered scheduled jobmessages from every host because they keep their internal state up-to-date even if they're not running the jobs. You can try stopping that leader host with:And you should see another host take over the leadership.
PLAT-347
Summary by CodeRabbit
New Features
Refactor
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.