Skip to content

Commit 1431585

Browse files
committed
add end-to-end MySQL->Debezium(embedded)->XTDB demo (plus a sink as well)
1 parent 489ed3e commit 1431585

File tree

19 files changed

+1310
-120
lines changed

19 files changed

+1310
-120
lines changed

debezium/README.md

Lines changed: 68 additions & 117 deletions
Original file line numberDiff line numberDiff line change
@@ -1,151 +1,102 @@
1-
# Debezium CDC Demo for XTDB
1+
# Debezium CDC Demos for XTDB
22

3-
This demo shows how XTDB can ingest Debezium CDC (Change Data Capture) events from MySQL, handling schema evolution (new columns, type changes) without any schema migrations.
3+
This directory contains demos showing how XTDB can ingest Debezium-style CDC (Change Data Capture) events, demonstrating schema-less ingestion and bitemporal capabilities.
44

5-
## What This Demonstrates
5+
## Demos
66

7-
1. **Schema-less ingestion**: XTDB accepts records with varying column sets - no DDL required
8-
2. **Schema evolution**: New columns appear in CDC events over time, XTDB handles them automatically
9-
3. **Bitemporality**: CDC event timestamps become `_valid_from`, enabling time-travel queries
10-
4. **Full CDC support**: Handles inserts, updates, and deletes from Debezium
7+
### [debezium-static-json](./debezium-static-json/)
118

12-
## Scenario
9+
**Static JSON demo** - Uses pre-generated Debezium JSON events to demonstrate XTDB's CDC capabilities without requiring any external infrastructure.
1310

14-
The demo simulates a MySQL "accounts" database with three evolving tables:
11+
- No MySQL, Kafka, or other dependencies
12+
- Good for understanding the data format and XTDB behavior
13+
- Quick to run and explore
1514

16-
| Table | Original Schema | Evolved Schema (new columns) |
17-
|-------|-----------------|------------------------------|
18-
| `users` | id, email, username, created_at | + phone_number, verified_at |
19-
| `profiles` | id, user_id, display_name | + avatar_url, bio |
20-
| `sessions` | id, user_id, token, created_at | + device_type, ip_address |
15+
```bash
16+
cd debezium-static-json
17+
mise run
18+
```
2119

22-
The `cdc/events.json` file contains 22 Debezium events spanning 4 days:
23-
- Initial inserts with original schema
24-
- Schema evolution (new columns appear in events)
25-
- Updates to existing records
26-
- Deletes (user deactivation, session logout)
20+
### [debezium-xtdb](./debezium-xtdb/)
2721

28-
## Running the Demo
22+
**Live MySQL CDC to XTDB** - A Java-based Debezium embedded engine that captures changes from a real MySQL/MariaDB database and writes them to XTDB with full bitemporal support.
23+
24+
- Real MySQL/MariaDB CDC (binlog-based)
25+
- Single JVM process (no Kafka required)
26+
- Full bitemporal support (`_valid_from`, `FOR PORTION OF VALID_TIME` deletes)
27+
- Includes helper scripts for testing (mysql-writer, xtdb-poller)
2928

3029
```bash
31-
# From the debezium directory
32-
cd debezium
30+
cd debezium-xtdb
31+
mise run demo # Installs MariaDB, starts it, runs CDC
32+
```
3333

34-
# Install dependencies and run ingestion
35-
mise run
34+
The module also includes a Debezium Server sink connector for deployment with standalone Debezium Server.
3635

37-
# Or step by step:
38-
mise run deps # Install Go dependencies
39-
mise run run # Ingest CDC events into XTDB
36+
## Key Concepts
4037

41-
# Run example queries
42-
mise run query
38+
### Schema-less Ingestion
4339

44-
# Check record counts
45-
mise run test
40+
XTDB accepts records with varying column sets without requiring DDL changes. When your source schema evolves (new columns added), XTDB handles it automatically.
4641

47-
# Clean and re-run
48-
mise run reset
49-
```
42+
### Bitemporal Tracking
5043

51-
## How It Works
44+
CDC event timestamps become `_valid_from` in XTDB, enabling:
45+
- Point-in-time queries: "What was the state at time X?"
46+
- History queries: "Show all versions of record Y"
47+
- Deleted record visibility: Records aren't lost, they have `_valid_to` set
5248

5349
### Debezium Event Format
5450

55-
Each CDC event follows the Debezium format:
51+
The demos handle both full Debezium envelope format and the flattened format (via `ExtractNewRecordState` transform):
5652

5753
```json
5854
{
59-
"payload": {
60-
"op": "c", // c=create, u=update, d=delete
61-
"ts_ms": 1704067200000, // Event timestamp (milliseconds)
62-
"source": {
63-
"db": "accounts",
64-
"table": "users"
65-
},
66-
"before": null, // Previous state (for updates/deletes)
67-
"after": { // New state
68-
"id": 1,
69-
"email": "alice@example.com",
70-
"username": "alice"
71-
}
72-
}
55+
"id": 1,
56+
"email": "alice@example.com",
57+
"username": "alice",
58+
"__op": "c",
59+
"__table": "accounts.users",
60+
"__source_ts_ms": 1704067200000
7361
}
7462
```
7563

76-
### Transformation to XTDB
77-
78-
The Go script transforms each event:
79-
80-
| Debezium | XTDB |
81-
|----------|------|
82-
| `source.table` | Table name |
83-
| `after.id` | `_id` |
84-
| `ts_ms` | `_valid_from` |
85-
| `after.*` | Record fields (dynamic) |
86-
87-
Operations:
88-
- **create/update**`INSERT INTO table RECORDS {...}`
89-
- **delete**`DELETE FROM table FOR PORTION OF VALID_TIME ...`
90-
91-
### Schema Evolution Handling
92-
93-
XTDB's schema-less design means:
64+
## Comparison
9465

95-
1. **Event 1** (Jan 1): `{id: 1, email: "alice@example.com"}`
96-
2. **Event 2** (Jan 2): `{id: 4, email: "diana@example.com", phone_number: "+1-555-0104"}`
66+
| Feature | Static JSON | Live CDC (debezium-xtdb) |
67+
|---------|-------------|--------------------------|
68+
| Real database | No | Yes (MySQL/MariaDB) |
69+
| Kafka required | No | No |
70+
| CDC engine | None | Debezium Embedded |
71+
| Latency | N/A | Sub-second |
72+
| Setup complexity | Minimal | Medium (MariaDB install) |
73+
| Best for | Learning | Development/Testing |
9774

98-
No `ALTER TABLE` needed! XTDB stores each record with its actual columns.
75+
## Architecture
9976

100-
## Example Queries
101-
102-
After ingestion, you can run time-travel queries:
103-
104-
```sql
105-
-- Current state of users
106-
SELECT * FROM users;
107-
108-
-- See all historical versions of Alice
109-
SELECT * FROM users FOR ALL VALID_TIME WHERE _id = 1;
110-
111-
-- Users as of Jan 1, 2024 (before schema evolution)
112-
SELECT * FROM users FOR VALID_TIME AS OF TIMESTAMP '2024-01-01T12:00:00Z';
113-
114-
-- See deleted users
115-
SELECT * FROM users FOR ALL VALID_TIME WHERE _valid_to IS NOT NULL;
11677
```
117-
118-
## Files
119-
120-
```
121-
debezium/
122-
├── .mise.toml # Task definitions
123-
├── go.mod # Go module
124-
├── main.go # Ingestion script (~150 lines)
125-
├── cdc/
126-
│ └── events.json # Static Debezium CDC events (22 events)
127-
├── sql/
128-
│ └── queries.sql # Example queries
129-
└── README.md # This file
78+
debezium-xtdb (Embedded Mode)
79+
┌─────────────────────────────────────────────────────────────────────────┐
80+
│ │
81+
│ ┌──────────────────┐ ┌────────────────┐ ┌───────────────────┐ │
82+
│ │ MySQL/MariaDB │───►│ Debezium │───►│ XtdbWriter │ │
83+
│ │ (binlog) │ │ Embedded Engine│ │ (JDBC) │ │
84+
│ └──────────────────┘ └────────────────┘ └─────────┬─────────┘ │
85+
│ │ │
86+
└──────────────────────────────────────────────────────────┼─────────────┘
87+
88+
89+
┌──────────────┐
90+
│ XTDB │
91+
│ (bitemporal) │
92+
└──────────────┘
13093
```
13194

132-
## Why Not a Live Kafka/Debezium Setup?
133-
134-
This demo uses static JSON files to:
135-
- Keep the demo simple and self-contained
136-
- Focus on XTDB's schema evolution capabilities
137-
- Avoid requiring Kafka, Zookeeper, MySQL, and Debezium containers
138-
139-
For production, you would connect XTDB to Kafka using a similar ingestion approach, or use the XTDB Kafka module directly.
140-
141-
## Production Considerations
95+
## Production Deployment
14296

143-
For real-world CDC ingestion:
97+
For production CDC:
14498

145-
1. **Kafka Consumer**: Replace file reading with a Kafka consumer (e.g., Sarama for Go)
146-
2. **Batching**: Batch inserts for better throughput
147-
3. **Exactly-once**: Track Kafka offsets in XTDB for exactly-once semantics
148-
4. **Error handling**: Dead letter queues for failed events
149-
5. **Monitoring**: Metrics for lag, throughput, and errors
99+
1. **Embedded mode** (`debezium-xtdb`): Single JAR, good for simpler deployments
100+
2. **Debezium Server + Sink**: Deploy the XTDB sink JAR with Debezium Server for more complex setups with multiple connectors
150101

151-
Alternatively, a sample XTDB Kafka Connect Sink is available (which may be further adapted to support MySQL-compatible Debezium output): https://github.com/egg-juxt/xtdb-kafka-connect
102+
See the [debezium-xtdb directory](./debezium-xtdb/) for detailed instructions.
Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
[tools]
22
go = "1.21"
33

4+
[env]
5+
XTDB_HOST = "xtdb"
6+
47
[tasks.deps]
58
description = "Install Go dependencies"
69
run = "go mod download && go mod tidy"
@@ -18,7 +21,7 @@ run = "go run . cdc/events.json"
1821
[tasks.clean]
1922
description = "Clean tables in XTDB for re-run"
2023
run = """
21-
psql "${XTDB_URL:-postgres://xtdb:xtdb@localhost:5432/xtdb}" -c "
24+
psql -h "$XTDB_HOST" -p 5432 -U xtdb -d xtdb -c "
2225
DELETE FROM users WHERE 1=1;
2326
DELETE FROM profiles WHERE 1=1;
2427
DELETE FROM sessions WHERE 1=1;
@@ -31,12 +34,12 @@ depends = ["clean", "run"]
3134

3235
[tasks.query]
3336
description = "Run example queries"
34-
run = "psql \"${XTDB_URL:-postgres://xtdb:xtdb@localhost:5432/xtdb}\" -f sql/queries.sql"
37+
run = "psql -h \"$XTDB_HOST\" -p 5432 -U xtdb -d xtdb -f sql/queries.sql"
3538

3639
[tasks.test]
3740
description = "Verify data was ingested correctly"
3841
run = """
39-
psql "${XTDB_URL:-postgres://xtdb:xtdb@localhost:5432/xtdb}" -c "
42+
psql -h "$XTDB_HOST" -p 5432 -U xtdb -d xtdb -c "
4043
SELECT 'users' as table_name, COUNT(*) as count FROM users
4144
UNION ALL
4245
SELECT 'profiles', COUNT(*) FROM profiles

0 commit comments

Comments
 (0)