cluster-sim

Distributed system simulator for edge computing scenarios. This project demonstrates node selection algorithms, state synchronization, batch processing with failure handling, and consensus protocols relevant to edge and distributed systems.

Features

Node simulation: 20–50 nodes with labels, capacity, health (Online/Offline/Degraded), and configurable failure rate
State management: Distributed key-value store with versioning, quorum replication, and conflict resolution (last-write-wins)
Batch operations: Deploy, config updates, and health checks across node groups with worker pool, timeouts, and retries
Algorithms: Least-loaded and label-affinity selection, spread across zones, gossip-based state sync, heartbeat failure detection, Raft-lite leader election
Network simulation: Configurable latency (50–500 ms) and partition scenarios

Project structure

cluster-sim/
├── pkg/
│   ├── node/          # Node representation, manager, selection strategies
│   ├── state/         # Distributed store, replication, consensus
│   ├── batch/         # Batch processor, worker pool, scheduler
│   ├── network/       # Latency and partition simulation
│   ├── protocol/      # Gossip, heartbeat, leader election
│   └── metrics/       # Metrics collection and reporting
├── scenarios/         # Batch deploy, node failure, partition, state sync
├── main.go
├── go.mod
└── README.md

Requirements

Go 1.21+

Build and run

# Build
go build -o cluster-sim .
# or
make build

# Run a scenario
go run main.go simulate batch-deploy --nodes 30
go run main.go simulate node-failure --failure-rate 0.1
go run main.go simulate network-partition --partition-time 30s
go run main.go simulate state-sync

# Benchmark
go run main.go benchmark state-sync --iterations 100
go run main.go benchmark batch-deploy --iterations 50

# Visualize cluster state (refreshes every 1s)
go run main.go visualize --refresh 1s

Using the Makefile:

make build
make run-simulate-batch
make run-simulate-failure
make run-simulate-partition
make run-simulate-sync
make run-benchmark
make run-visualize
make test

Algorithms and concepts

1. Node selection

Least-loaded: Sort nodes by utilization (CPU + memory) and pick the N least loaded. Reduces hotspots.
Label-affinity: Filter nodes that match all required labels (e.g. env=prod, zone=zone-a). Used for placement constraints.
Spread: Round-robin across zones/regions so replicas are spread for availability.
Random with constraints: Random choice among nodes that satisfy min capacity and optional labels.

2. State management

Versioned store: Each key has a logical version; writes increment it. Used for conflict detection.
Quorum: Write succeeds when at least W = N/2+1 replicas accept. Reads can use ONE, QUORUM, or ALL.
Conflict resolution: Last-write-wins using version (and timestamp) when merging state from gossip or after partition heal.

3. Gossip protocol

Epidemic propagation: Periodically each node picks a small random set of peers (fanout) and exchanges state.
Merge: Received state is merged locally; higher version wins. Ensures eventual consistency across the cluster.
Convergence: With enough rounds, all participating nodes converge to the same set of keys (under no new writes and no partitions).

4. Leader election (Raft-lite)

Roles: Follower, Candidate, Leader.
Election: On timeout, node becomes Candidate, increments term, votes for itself, and requests votes from others. Majority wins.
Heartbeat: Leader sends heartbeats; followers reset timeout. Used to detect leader failure and trigger re-election.

5. Batch processing

Worker pool: Fixed number of workers consume node jobs from a channel.
Per-node timeout: Each node operation is bounded by a timeout; slow/failed nodes don’t block others.
Retries: Configurable retries with exponential backoff for transient failures.
Aggregation: Results are aggregated into succeeded/failed/timeout counts and per-node status.

6. Network simulation

Latency: Random delay between min and max (e.g. 50–500 ms) per logical message.
Partitions: Nodes can be split into disjoint partitions; communication only within the same partition. Simulates split-brain and recovery.

Scenarios

Scenario	What it demonstrates
batch-deploy	Spread selection, deploy workload, 2 nodes failed mid-run, retries and final status
node-failure	Random failures, leader election, workload migration from failed nodes
network-partition	Split cluster into two partitions, write in one, heal partition, state reconciliation
state-sync	Write 1000 keys, gossip to 5 replicas, convergence time, conflict resolution

Example outputs

Batch deployment

Batch Deployment Scenario
Nodes in cluster: 30, Selected (spread): 10

Simulated failure of node-10 and node-11 during deployment.

--- Result ---
Total time: 1.5s
Succeeded: 8
Failed: 2
Timeout: 0
Per-node status:
  node-07: Succeeded
  node-10: Failed (node not healthy)
  ...

Node failure

Node Failure Scenario
Starting with 30 healthy nodes.
Failed nodes: [node-28 node-22 node-06]
Leader elected: node-01
State after failure: Online=27 Offline=3 Degraded=0
Workload migration: from failed nodes -> node-30 (example target)

State sync

State Synchronization Scenario
Replicas: 5, writing 1000 keys.
Wrote 1000 keys in 514µs.
Convergence: 5/5 stores have 1000 keys after 2.0s.
Conflict resolution (merge): conflict-key = value-b

Testing

go test ./pkg/... -v
make test

Unit tests cover:

Node add/remove workload, utilization, state updates, capacity errors
Selector: least-loaded, label-affinity, spread, random
State store: set/get, versioning, merge
Batch processor: success, failure handling, retries

Metrics (conceptual)

The pkg/metrics package provides a collector and reporter for:

Performance: Batch completion time, state-sync latency, gossip convergence, election duration
Reliability: Batch success rate, consistency rate, failure detection and recovery time
Resource: Utilization, memory, goroutine count (if wired in)

Dependencies

github.com/google/uuid – unique IDs for jobs/workloads
github.com/olekukonko/tablewriter – CLI tables for visualize and reports

License

Use as needed for learning and portfolio demonstration of distributed systems concepts in Go.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
pkg		pkg
scenarios		scenarios
Makefile		Makefile
README.md		README.md
benchmark.go		benchmark.go
cluster-sim		cluster-sim
go.mod		go.mod
go.sum		go.sum
main.go		main.go
simulate.go		simulate.go
visualize.go		visualize.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cluster-sim

Features

Project structure

Requirements

Build and run

Algorithms and concepts

1. Node selection

2. State management

3. Gossip protocol

4. Leader election (Raft-lite)

5. Batch processing

6. Network simulation

Scenarios

Example outputs

Batch deployment

Node failure

State sync

Testing

Metrics (conceptual)

Dependencies

License

About

Uh oh!

Releases

Packages

Languages

abhicodes11/Cluster-sim

Folders and files

Latest commit

History

Repository files navigation

cluster-sim

Features

Project structure

Requirements

Build and run

Algorithms and concepts

1. Node selection

2. State management

3. Gossip protocol

4. Leader election (Raft-lite)

5. Batch processing

6. Network simulation

Scenarios

Example outputs

Batch deployment

Node failure

State sync

Testing

Metrics (conceptual)

Dependencies

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages