Because reading about databases is boring. Building them is fun.
You know that feeling when you're reading Designing Data-Intensive Applications or Database Internals and your eyes start glazing over at page 47?
Yeah, me too.
Books about B-trees, LSM trees, MVCC, and WAL are dense. The concepts are fascinating, but reading 800 pages of text about how databases work is... well, let's just say it's not the most exciting Saturday night.
Then I had a thought: What if instead of just reading about databases, I could build one? And not just one - what if I could build dozens, trying different combinations, seeing the tradeoffs play out in real-time?
What if learning database internals felt like playing with LEGO?
That's this project.
DB Simulator is a visual tool where you take database building blocks and compose them together to create your own database. Then you can run queries, watch how it executes, and see the tradeoffs in action.
Think:
- Unreal Engine Blueprints, but you're learning databases
- Scratch programming, but the output is a working storage engine
- Figma, but you're designing MVCC instead of UI
Reading about databases teaches you what things do.
Building databases teaches you why they do it that way.
Want to understand why Cassandra uses LSM trees instead of B-trees? Build both and run the same workload. Watch the LSM tree handle writes faster but reads slower. See the write amplification. Feel the tradeoff.
Want to grok the difference between 2PL and MVCC? Build both. Watch how 2PL blocks readers, but MVCC lets them run free. See why PostgreSQL chose MVCC.
Learning by doing beats reading every time.
This project provides modular blocks across the entire database stack:
src/blocks/
├── storage/ # LSM trees, B-trees, Log-structured storage
├── index/ # Hash indexes, B+ trees, Learned indexes
├── concurrency/ # 2PL, MVCC, OCC, timestamp ordering
├── transaction-recovery/ # WAL, ARIES, shadow paging
├── query-execution/ # Volcano, vectorized, JIT compilation
├── optimization/ # Rule-based, cost-based, learned optimizers
├── buffer/ # LRU, Clock, 2Q policies
├── compression/ # Dictionary, RLE, Snappy, Zstd
├── partitioning/ # Range, hash, list partitioning
└── distribution/ # Replication, sharding, consensus
- Read chapter about B-trees: 1 hour
- Understand roughly how they work: Maybe?
- Read chapter about LSM trees: 1 hour
- Try to remember what B-trees were: ???
- Read about when to use each: 30 minutes
- Still not sure which is better for your use case: Forever
- Drag a B-tree block onto canvas: 10 seconds
- Add a write-heavy workload: 20 seconds
- Hit run and watch: 10 seconds
- Now try an LSM tree with same workload: 20 seconds
- See the difference visually: Immediately
- Actually understand the tradeoff: Finally!
# Clone the repo
git clone https://github.com/yourname/DB-Simulator.git
cd DB-Simulator
# Explore the block system
ls src/blocks/
# Read the vision
cat docs/Modular\ DB\ Builder\ -\ PRD\ \(Shreyas\ Style\).md- Are reading DDIA or Database Internals and want to actually understand it
- Learn better by doing than by reading
- Want to build a custom DB for a unique use case but don't know where to start
- Think databases are interesting but find the textbooks dry
- Want to understand why databases make the choices they do
- Wish you could experiment with "what if MongoDB did X instead of Y?"
- Just think databases are neat and want to play with them
- Love reading 800-page technical books cover-to-cover
- Think building things is a waste of time
- Believe the only way to learn is through pure theory
- Already understand everything about databases
Current Stage: Early prototype / Design phase
We're building the foundation. Check the roadmap:
/docs/8-Week Roadmap - Modular DB Builder.md- Implementation timeline/docs/Design Document - Modular DB Builder.md- Technical architecture/docs/Wireframes - Modular DB Builder.md- UI mockups
Can we make learning database internals as fun as using them?
If this works, people won't just read about databases - they'll build them. They'll understand the why behind every design decision. They'll stop cargo-culting PostgreSQL and start designing databases that fit their actual use cases.
And maybe, just maybe, someone will finally understand what "everything is a log" actually means by building it themselves.
The best way to understand how something works is to build it yourself. Let's make that actually enjoyable.
This is an early-stage project born from frustration with boring textbooks and a desire to learn by building.
If you're excited about:
- Making learning more hands-on and visual
- Database internals (or want to understand them better)
- Building tools that make learning fun
- The idea that understanding comes from experimentation
...then you're in the right place. Check /docs for the full vision and roadmap.
This project is as much about the journey of learning as it is about the destination.
TBD
"What I cannot create, I do not understand." - Richard Feynman