|
1 | | -# Debezium CDC Demo for XTDB |
| 1 | +# Debezium CDC Demos for XTDB |
2 | 2 |
|
3 | | -This demo shows how XTDB can ingest Debezium CDC (Change Data Capture) events from MySQL, handling schema evolution (new columns, type changes) without any schema migrations. |
| 3 | +This directory contains demos showing how XTDB can ingest Debezium-style CDC (Change Data Capture) events, demonstrating schema-less ingestion and bitemporal capabilities. |
4 | 4 |
|
5 | | -## What This Demonstrates |
| 5 | +## Demos |
6 | 6 |
|
7 | | -1. **Schema-less ingestion**: XTDB accepts records with varying column sets - no DDL required |
8 | | -2. **Schema evolution**: New columns appear in CDC events over time, XTDB handles them automatically |
9 | | -3. **Bitemporality**: CDC event timestamps become `_valid_from`, enabling time-travel queries |
10 | | -4. **Full CDC support**: Handles inserts, updates, and deletes from Debezium |
| 7 | +### [debezium-static-json](./debezium-static-json/) |
11 | 8 |
|
12 | | -## Scenario |
| 9 | +**Static JSON demo** - Uses pre-generated Debezium JSON events to demonstrate XTDB's CDC capabilities without requiring any external infrastructure. |
13 | 10 |
|
14 | | -The demo simulates a MySQL "accounts" database with three evolving tables: |
| 11 | +- No MySQL, Kafka, or other dependencies |
| 12 | +- Good for understanding the data format and XTDB behavior |
| 13 | +- Quick to run and explore |
15 | 14 |
|
16 | | -| Table | Original Schema | Evolved Schema (new columns) | |
17 | | -|-------|-----------------|------------------------------| |
18 | | -| `users` | id, email, username, created_at | + phone_number, verified_at | |
19 | | -| `profiles` | id, user_id, display_name | + avatar_url, bio | |
20 | | -| `sessions` | id, user_id, token, created_at | + device_type, ip_address | |
| 15 | +```bash |
| 16 | +cd debezium-static-json |
| 17 | +mise run |
| 18 | +``` |
21 | 19 |
|
22 | | -The `cdc/events.json` file contains 22 Debezium events spanning 4 days: |
23 | | -- Initial inserts with original schema |
24 | | -- Schema evolution (new columns appear in events) |
25 | | -- Updates to existing records |
26 | | -- Deletes (user deactivation, session logout) |
| 20 | +### [debezium-xtdb](./debezium-xtdb/) |
27 | 21 |
|
28 | | -## Running the Demo |
| 22 | +**Live MySQL CDC to XTDB** - A Java-based Debezium embedded engine that captures changes from a real MySQL/MariaDB database and writes them to XTDB with full bitemporal support. |
| 23 | + |
| 24 | +- Real MySQL/MariaDB CDC (binlog-based) |
| 25 | +- Single JVM process (no Kafka required) |
| 26 | +- Full bitemporal support (`_valid_from`, `FOR PORTION OF VALID_TIME` deletes) |
| 27 | +- Includes helper scripts for testing (mysql-writer, xtdb-poller) |
29 | 28 |
|
30 | 29 | ```bash |
31 | | -# From the debezium directory |
32 | | -cd debezium |
| 30 | +cd debezium-xtdb |
| 31 | +mise run demo # Installs MariaDB, starts it, runs CDC |
| 32 | +``` |
33 | 33 |
|
34 | | -# Install dependencies and run ingestion |
35 | | -mise run |
| 34 | +The module also includes a Debezium Server sink connector for deployment with standalone Debezium Server. |
36 | 35 |
|
37 | | -# Or step by step: |
38 | | -mise run deps # Install Go dependencies |
39 | | -mise run run # Ingest CDC events into XTDB |
| 36 | +## Key Concepts |
40 | 37 |
|
41 | | -# Run example queries |
42 | | -mise run query |
| 38 | +### Schema-less Ingestion |
43 | 39 |
|
44 | | -# Check record counts |
45 | | -mise run test |
| 40 | +XTDB accepts records with varying column sets without requiring DDL changes. When your source schema evolves (new columns added), XTDB handles it automatically. |
46 | 41 |
|
47 | | -# Clean and re-run |
48 | | -mise run reset |
49 | | -``` |
| 42 | +### Bitemporal Tracking |
50 | 43 |
|
51 | | -## How It Works |
| 44 | +CDC event timestamps become `_valid_from` in XTDB, enabling: |
| 45 | +- Point-in-time queries: "What was the state at time X?" |
| 46 | +- History queries: "Show all versions of record Y" |
| 47 | +- Deleted record visibility: Records aren't lost, they have `_valid_to` set |
52 | 48 |
|
53 | 49 | ### Debezium Event Format |
54 | 50 |
|
55 | | -Each CDC event follows the Debezium format: |
| 51 | +The demos handle both full Debezium envelope format and the flattened format (via `ExtractNewRecordState` transform): |
56 | 52 |
|
57 | 53 | ```json |
58 | 54 | { |
59 | | - "payload": { |
60 | | - "op": "c", // c=create, u=update, d=delete |
61 | | - "ts_ms": 1704067200000, // Event timestamp (milliseconds) |
62 | | - "source": { |
63 | | - "db": "accounts", |
64 | | - "table": "users" |
65 | | - }, |
66 | | - "before": null, // Previous state (for updates/deletes) |
67 | | - "after": { // New state |
68 | | - "id": 1, |
69 | | - "email": "alice@example.com", |
70 | | - "username": "alice" |
71 | | - } |
72 | | - } |
| 55 | + "id": 1, |
| 56 | + "email": "alice@example.com", |
| 57 | + "username": "alice", |
| 58 | + "__op": "c", |
| 59 | + "__table": "accounts.users", |
| 60 | + "__source_ts_ms": 1704067200000 |
73 | 61 | } |
74 | 62 | ``` |
75 | 63 |
|
76 | | -### Transformation to XTDB |
77 | | - |
78 | | -The Go script transforms each event: |
79 | | - |
80 | | -| Debezium | XTDB | |
81 | | -|----------|------| |
82 | | -| `source.table` | Table name | |
83 | | -| `after.id` | `_id` | |
84 | | -| `ts_ms` | `_valid_from` | |
85 | | -| `after.*` | Record fields (dynamic) | |
86 | | - |
87 | | -Operations: |
88 | | -- **create/update** → `INSERT INTO table RECORDS {...}` |
89 | | -- **delete** → `DELETE FROM table FOR PORTION OF VALID_TIME ...` |
90 | | - |
91 | | -### Schema Evolution Handling |
92 | | - |
93 | | -XTDB's schema-less design means: |
| 64 | +## Comparison |
94 | 65 |
|
95 | | -1. **Event 1** (Jan 1): `{id: 1, email: "alice@example.com"}` |
96 | | -2. **Event 2** (Jan 2): `{id: 4, email: "diana@example.com", phone_number: "+1-555-0104"}` |
| 66 | +| Feature | Static JSON | Live CDC (debezium-xtdb) | |
| 67 | +|---------|-------------|--------------------------| |
| 68 | +| Real database | No | Yes (MySQL/MariaDB) | |
| 69 | +| Kafka required | No | No | |
| 70 | +| CDC engine | None | Debezium Embedded | |
| 71 | +| Latency | N/A | Sub-second | |
| 72 | +| Setup complexity | Minimal | Medium (MariaDB install) | |
| 73 | +| Best for | Learning | Development/Testing | |
97 | 74 |
|
98 | | -No `ALTER TABLE` needed! XTDB stores each record with its actual columns. |
| 75 | +## Architecture |
99 | 76 |
|
100 | | -## Example Queries |
101 | | - |
102 | | -After ingestion, you can run time-travel queries: |
103 | | - |
104 | | -```sql |
105 | | --- Current state of users |
106 | | -SELECT * FROM users; |
107 | | - |
108 | | --- See all historical versions of Alice |
109 | | -SELECT * FROM users FOR ALL VALID_TIME WHERE _id = 1; |
110 | | - |
111 | | --- Users as of Jan 1, 2024 (before schema evolution) |
112 | | -SELECT * FROM users FOR VALID_TIME AS OF TIMESTAMP '2024-01-01T12:00:00Z'; |
113 | | - |
114 | | --- See deleted users |
115 | | -SELECT * FROM users FOR ALL VALID_TIME WHERE _valid_to IS NOT NULL; |
116 | 77 | ``` |
117 | | - |
118 | | -## Files |
119 | | - |
120 | | -``` |
121 | | -debezium/ |
122 | | -├── .mise.toml # Task definitions |
123 | | -├── go.mod # Go module |
124 | | -├── main.go # Ingestion script (~150 lines) |
125 | | -├── cdc/ |
126 | | -│ └── events.json # Static Debezium CDC events (22 events) |
127 | | -├── sql/ |
128 | | -│ └── queries.sql # Example queries |
129 | | -└── README.md # This file |
| 78 | + debezium-xtdb (Embedded Mode) |
| 79 | +┌─────────────────────────────────────────────────────────────────────────┐ |
| 80 | +│ │ |
| 81 | +│ ┌──────────────────┐ ┌────────────────┐ ┌───────────────────┐ │ |
| 82 | +│ │ MySQL/MariaDB │───►│ Debezium │───►│ XtdbWriter │ │ |
| 83 | +│ │ (binlog) │ │ Embedded Engine│ │ (JDBC) │ │ |
| 84 | +│ └──────────────────┘ └────────────────┘ └─────────┬─────────┘ │ |
| 85 | +│ │ │ |
| 86 | +└──────────────────────────────────────────────────────────┼─────────────┘ |
| 87 | + │ |
| 88 | + ▼ |
| 89 | + ┌──────────────┐ |
| 90 | + │ XTDB │ |
| 91 | + │ (bitemporal) │ |
| 92 | + └──────────────┘ |
130 | 93 | ``` |
131 | 94 |
|
132 | | -## Why Not a Live Kafka/Debezium Setup? |
133 | | - |
134 | | -This demo uses static JSON files to: |
135 | | -- Keep the demo simple and self-contained |
136 | | -- Focus on XTDB's schema evolution capabilities |
137 | | -- Avoid requiring Kafka, Zookeeper, MySQL, and Debezium containers |
138 | | - |
139 | | -For production, you would connect XTDB to Kafka using a similar ingestion approach, or use the XTDB Kafka module directly. |
140 | | - |
141 | | -## Production Considerations |
| 95 | +## Production Deployment |
142 | 96 |
|
143 | | -For real-world CDC ingestion: |
| 97 | +For production CDC: |
144 | 98 |
|
145 | | -1. **Kafka Consumer**: Replace file reading with a Kafka consumer (e.g., Sarama for Go) |
146 | | -2. **Batching**: Batch inserts for better throughput |
147 | | -3. **Exactly-once**: Track Kafka offsets in XTDB for exactly-once semantics |
148 | | -4. **Error handling**: Dead letter queues for failed events |
149 | | -5. **Monitoring**: Metrics for lag, throughput, and errors |
| 99 | +1. **Embedded mode** (`debezium-xtdb`): Single JAR, good for simpler deployments |
| 100 | +2. **Debezium Server + Sink**: Deploy the XTDB sink JAR with Debezium Server for more complex setups with multiple connectors |
150 | 101 |
|
151 | | -Alternatively, a sample XTDB Kafka Connect Sink is available (which may be further adapted to support MySQL-compatible Debezium output): https://github.com/egg-juxt/xtdb-kafka-connect |
| 102 | +See the [debezium-xtdb directory](./debezium-xtdb/) for detailed instructions. |
0 commit comments