-
Notifications
You must be signed in to change notification settings - Fork 2
feat: enabled etcd mode (server ↔ client) reconfiguration #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| kind: Added | ||
| body: Enabled automatic etcd client ↔ server mode reconfiguration | ||
| time: 2026-01-09T22:35:36.887346+05:30 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -53,6 +53,10 @@ func (e *EmbeddedEtcd) Start(ctx context.Context) error { | |
| e.mu.Lock() | ||
| defer e.mu.Unlock() | ||
|
|
||
| if e.etcd != nil { | ||
| return nil // already started | ||
| } | ||
|
|
||
| initialized, err := e.IsInitialized() | ||
| if err != nil { | ||
| return err | ||
|
|
@@ -292,6 +296,7 @@ func (e *EmbeddedEtcd) Shutdown() error { | |
| } | ||
| if e.etcd != nil { | ||
| e.etcd.Close() | ||
| e.etcd = nil | ||
| } | ||
| return errors.Join(errs...) | ||
| } | ||
|
|
@@ -443,6 +448,78 @@ func (e *EmbeddedEtcd) HealthCheck() common.ComponentStatus { | |
| } | ||
| } | ||
|
|
||
| func (e *EmbeddedEtcd) ChangeMode(ctx context.Context, mode config.EtcdMode) (Etcd, error) { | ||
| if mode != config.EtcdModeClient { | ||
| return nil, fmt.Errorf("invalid mode transition from %s to %s", config.EtcdModeServer, mode) | ||
| } | ||
|
|
||
| if err := e.Start(ctx); err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| cfg := e.cfg.Config() | ||
|
|
||
| embeddedClient, err := e.GetClient() | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| // Get the full member list before removing this host | ||
| resp, err := embeddedClient.MemberList(ctx) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to list etcd members for server->client transition: %w", err) | ||
| } | ||
|
|
||
| var endpoints []string | ||
| for _, m := range resp.Members { | ||
| // Skip this host's member; we are about to remove it. | ||
| if m.Name == cfg.HostID { | ||
| continue | ||
| } | ||
| endpoints = append(endpoints, m.ClientURLs...) | ||
| } | ||
|
|
||
| if len(endpoints) == 0 { | ||
| return nil, fmt.Errorf("cannot demote etcd server on host %s: no remaining cluster members with client URLs", cfg.HostID) | ||
| } | ||
|
|
||
| generated := e.cfg.GeneratedConfig() | ||
| generated.EtcdClient.Endpoints = endpoints | ||
| if err := e.cfg.UpdateGeneratedConfig(generated); err != nil { | ||
| return nil, fmt.Errorf("failed to update generated config with client endpoints: %w", err) | ||
| } | ||
|
|
||
| if err := e.Shutdown(); err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| remote := NewRemoteEtcd(e.cfg, e.logger) | ||
| if err := remote.Start(ctx); err != nil { | ||
| return nil, fmt.Errorf("failed to start remote client: %w", err) | ||
| } | ||
|
|
||
| remoteClient, err := remote.GetClient() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to get remote client: %w", err) | ||
| } | ||
|
|
||
| if err := RemoveMember(ctx, remoteClient, cfg.HostID); err != nil { | ||
| return nil, fmt.Errorf("failed to remove embedded etcd from cluster: %w", err) | ||
| } | ||
|
|
||
| if err := os.RemoveAll(e.etcdDir()); err != nil { | ||
| return nil, fmt.Errorf("failed to remove embedded etcd data dir: %w", err) | ||
| } | ||
|
Comment on lines
+492
to
+512
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Potential inconsistent state on partial failure during demotion. Similar to the client→server transition, if operations fail after For example, if 🤖 Prompt for AI Agents |
||
|
|
||
| generated.EtcdMode = config.EtcdModeClient | ||
| generated.EtcdServer = config.EtcdServer{} | ||
| if err := e.cfg.UpdateGeneratedConfig(generated); err != nil { | ||
| return nil, fmt.Errorf("failed to clear out etcd server settings in generated config: %w", err) | ||
| } | ||
|
Comment on lines
+492
to
+518
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Partial failure leaves system in inconsistent state. After
Consider either:
|
||
|
|
||
| return remote, nil | ||
| } | ||
|
|
||
| const maxLearnerStallTime = 5 * time.Minute | ||
|
|
||
| type learnerProgress struct { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the full embedded.go file to understand the structure and mutex usage wc -l server/internal/etcd/embedded.goRepository: pgEdge/control-plane
Length of output: 100
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 2769
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 90
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 1088
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 46
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 264
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 1871
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 2180
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 383
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 222
🏁 Script executed:
Repository: pgEdge/control-plane
Length of output: 1077
Address missing mutex protection in ChangeMode and related methods.
ChangeModecreates a race condition: it callse.Start()(which acquires and releases the mutex), then callse.GetClient()ande.Shutdown()without holding the mutex. SinceShutdown()setse.etcd = nilwithout protection, any concurrent call toError()(which accessese.etcdunsafely at line 304-306) would panic.The broader issue is inconsistent synchronization in
EmbeddedEtcd:Start()properly uses the mutex, butGetClient(),Error(),Shutdown(), and other methods access shared fields (e.etcd,e.client) without any lock, creating potential data races.Either ensure at the caller level (in
provide.go) that this instance receives no concurrent access during the transition, or add mutex protection toChangeMode()and unprotected accessor methods throughout the class.🤖 Prompt for AI Agents