This project explores distributed systems through the practical, step-by-step implementation of a key-value store, focusing on real-world trade-offs between consistency, availability, and performance. 🚀
distributed-kv-store/
├── src/main/java/com/distributed/kv/ # Core implementation
└── src/test/java/com/distributed/kv/ # Tests demonstrating behavior
mvn clean compile
mvn test- Milestone 1: The CAP Theorem Lives and Breathes
- 1.1: Implement a basic in-memory Key-Value (KV) store.
- 1.2: Add version numbers to stored values for consistency tracking.
- Milestone 2: Leader-Follower - The Workhorse Pattern
- 2.1: Implement write coordination where all writes go through a single leader.
- 2.2: Enable read scaling via followers and demonstrate stale reads.
- Milestone 3: Quorum Systems - Democracy in Databases
- 3.1: Implement quorum-based reads and writes (
W+R > N). - 3.2: Add read repair to fix inconsistent data during reads.
- 3.1: Implement quorum-based reads and writes (
- Milestone 4: Leaderless - Embracing the Chaos
- 4.1: Implement write coordination where any node can act as the coordinator.
- 4.2: Measure and observe the eventual consistency window in the leaderless model.
- Milestone 5: Network Partitions - When Networks Lie
- 5.1: Simulate a network partition and detect the resulting split-brain problem.
- Milestone 6: Consistency Testing - Proving Correctness
- 6.1: Build a test suite to automatically detect linearizability violations.
- Milestone 7: Performance Patterns - Speed vs. Correctness
- 7.1: Measure and quantify the latency cost of different consistency levels.
- Milestone 8: Production Monitoring & Observability
- 8.1: Add instrumentation to track metrics like replication lag and stale read frequency.
- Milestone 9: Load Testing for Discovery
- 9.1: Design a load generator that mimics real-world temporal locality access patterns.
- 9.2: Implement accurate latency measurement that accounts for coordinated omission.
- Milestone 10: Failure Testing & Recovery
- 10.1: Implement graceful degradation patterns for when consistency guarantees cannot be met.
- 10.2: Simulate network partitions at the application level to test resilience.
- Milestone 11: Performance Analysis & Reporting
- 11.1: Analyze system performance using latency percentiles (p50, p95, p99) to identify long tails.
- 11.2: Quantify and create reports on the frequency and impact of stale reads for different configurations.
- Milestone 12: The Complete System Integration
- Final Integration: Ensure all components work together cohesively.
- Configuration: Externalize settings for N, R, and W values.
- Test Suite: Validate all required system configurations (Leader-Follower and Leaderless).
- Deployment: Create a Docker Compose file for a 5-node cluster deployment.
- Report: Write the final analysis and discussion of the results.