-
Notifications
You must be signed in to change notification settings - Fork 80
Add implementation plan for automatic shard/CDC replica sync #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
mcrnic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Plan in more detail how will the destination replica control flow of data from source replica.
Which network protocol will be used, how does it fit in current request/response flow.
How will the retry happen in case source or destination gets shut down?
I've expanded the implementation plan with detailed answers to your questions: Flow Control: Destination-driven pull model where the new replica controls pacing by requesting one chunk at a time. The destination specifies Protocol: TCP with Bincode serialization following the existing Retry Handling:
See full PR description for sequence diagram, message types, and state machine details. |
mcrnic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot shard and cdc currently don't accept or handle tcp. this needs to be implemented similar to how it's done in registry. There is no msgs.yaml
msgs definitions are added in msgs.go and need to be added to bincodegen as well.
Please also explain how will sync decide which column family it's requesting/getting data for and how will source find next chunk to iterate
Updated the plan to address your feedback: TCP Implementation: Shards/CDC will get new TCP servers modeled after Message Definitions: Corrected to use Column Family Selection & Chunk Iteration:
See full PR description for code examples and sequence diagram. |
mcrnic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot the cf index is not enough. Destination needs to match index to local column family name. There is no guarantee indices will be the same.
The names of column families need to match. I suggest transferring index and cf name pairs in initial sync response.
Updated the plan to address CF name matching: SYNC_START_RESPONSE now includes Destination uses names for local CF lookup:
Both request and response include cfName for verification:
See updated PR description for complete |
mcrnic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot start implementation.
Add only new messages and protocol for now and wait for confirmation to continue
…nization Co-authored-by: mcrnic <11664456+mcrnic@users.noreply.github.com>
Added sync protocol messages and constants in commit 694a6d9: Messages (
Protocol (
Generated: Awaiting confirmation to continue with TCP server implementation. |
mcrnic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot proceed with shard sync server implementation then wait confirmation
Co-authored-by: mcrnic <11664456+mcrnic@users.noreply.github.com>
Added Files: Features:
API:
Awaiting confirmation to continue with CDC sync server and client implementation. |
Implementation Progress
Completed: Sync Protocol Messages
Added new sync message types for replica database synchronization:
New Message Types (
go/msgs/msgs.go):ColumnFamilyInfo- Maps CF index to name for proper matching between replicasSyncStartReq- Request to start sync (containslastAppliedLogEntry)SyncStartResp- Response with snapshot metadata (snapshotLogEntry,columnFamilies[],totalSizeEstimate)SyncChunkReq- Request for data chunk (cfIndex,cfName,keyStart,maxSize)SyncChunkResp- Response with chunk data (keys,values,nextKeyStart,isLastCF)SyncCompleteReq/Resp- Finalize sync operationNew Protocol Constants (
cpp/core/Protocol.hppandgo/msgs/msgs.go):Generated Code:
bincodegen.goto include sync messagesSyncMessageKindenum (SYNC_START, SYNC_CHUNK, SYNC_COMPLETE)SyncReqContainerandSyncRespContainertypesSyncReqMsgandSyncRespMsgtype aliasesCompleted: ShardSyncServer
Added TCP server for shard sync protocol (
cpp/shard/ShardSyncServer.{hpp,cpp}):Features:
RegistryServerpattern[protocol_version:u32][length:u32][payload:bytes]API:
init()- Initialize listen socket and epollreceiveMessages(timeout)- Poll for incoming sync requestssendSyncResponses(responses)- Send responses to clientsreceivedSyncRequests()- Get received requests for processingNext Steps (Awaiting Confirmation)
CDCSyncServer(TCP server for CDC sync)SyncClient(destination sync logic)needsInitialSync()to SharedRocksDBOriginal prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.