A production-grade implementation of the Raft consensus algorithm for distributed systems. Built to understand how industry leaders like Google (Kubernetes/etcd), HashiCorp (Consul), and Cockroach Labs maintain consistency across distributed clusters.
This project implements the Raft consensus protocol, which provides:
- Fault Tolerance: System continues operating despite node failures
- Strong Consistency: All nodes maintain identical state
- Leader Election: Democratic selection without single points of failure
- Log Replication: Every operation is logged and replicated across the cluster
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Node 1 │────▶│ Node 2 │────▶│ Node 3 │
│ (Leader) │ │ (Follower) │ │ (Follower) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└───────────────────┴───────────────────┘
Log Replication Protocol
Each node in the cluster operates in one of three states:
| State | Description |
|---|---|
| Follower | Initial state, receives and replicates log entries from leader |
| Candidate | Temporary state during leader election |
| Leader | Coordinates cluster operations and log replication |
- Python 3.11+: Core implementation language
- asyncio: Asynchronous inter-node communication
- Docker: Multi-node cluster simulation
- pytest: Comprehensive test coverage
- aiosqlite: Persistent log storage
- Python 3.11 or higher
- Docker and Docker Compose
- Git
git clone https://github.com/ItalDao/raft-consensus-system.git
cd raft-consensus-system
pip install -r requirements.txt# Start a 5-node cluster
docker-compose up
# Run in detached mode
docker-compose up -d
# View logs
docker-compose logs -ffrom src.raft.node import RaftNode
# Initialize a node
node = RaftNode(node_id=1, cluster_size=5)
# Start the node
await node.start()
# Append entry (only leader can write)
await node.append_entry({"command": "SET x=10"})- Professional project structure
- Node state machine (Follower, Candidate, Leader)
- Persistent log storage system
- Leader election (RequestVote RPC)
- Log replication (AppendEntries RPC)
- Heartbeat mechanism and timeout detection
- Node failure detection
- Automatic leader designation
- Log synchronization
- Real-time cluster dashboard
- REST API for cluster control
- Visual log viewer
- Metrics and monitoring (Future)
- Log compaction and snapshots (Future)
# Run all tests
pytest
# Run with coverage
pytest --cov=src tests/
# Run specific test suite
pytest tests/test_consensus.pyraft-consensus-system/
├── src/
│ ├── raft/ # Core Raft implementation
│ ├── network/ # RPC and networking layer
│ └── storage/ # Persistent log storage
├── tests/ # Comprehensive test suite
├── docs/ # Architecture documentation
├── docker-compose.yml # Multi-node deployment
└── requirements.txt # Python dependencies
Note: This implementation follows the original Raft specification
- In Search of an Understandable Consensus Algorithm - Ongaro & Ousterhout, Stanford (2014)
- Raft Consensus Algorithm Visualization
- Consul Architecture - Production Raft implementation
Contributions are welcome. Areas of interest:
- Performance optimizations
- Additional test coverage
- Documentation improvements
- Advanced features (snapshots, membership changes)
Please ensure all tests pass and code follows PEP 8 style guidelines.
This project is licensed under the MIT License. See LICENSE file for details.
Italo D.
GitHub: @ItalDao
Built as a deep dive into distributed systems consensus algorithms used in production infrastructure.
Industry Applications: This algorithm powers critical infrastructure at Google (etcd in Kubernetes), HashiCorp (Consul service mesh), CockroachDB (distributed SQL), and many other large-scale systems requiring strong consistency guarantees.