Welcome to my personal journey exploring Apache Iceberg, an open table format for large-scale analytics datasets. This repository tracks my experiments, findings, setup steps, and integration with other tools like Spark, Nessie, MinIO, zeppelin and dremio.
Apache Iceberg is an open table format designed for huge analytic datasets. It brings SQL table-like features to data lakes: ACID transactions, schema evolution, time travel, partition evolution, and hidden partitioning.
- Apache Spark (Data processing)
- Apache Iceberg (table format)
- Project Nessie (catalog service for versioned data)
- MinIO (S3-compatible object storage)
- Docker Compose (for local orchestration)
- Zeppelin (interactive notebooks)
- dremio (Lakehouse platform)
.
βββ docker-compose.yml/ # Docker Compose setup
βββ zeppelin_notebooks/ # JupyterLab / Zeppelin notebooks
βββ zeppelin_conf/ # zeppelin interpretor conf
βββ spark/ # spark bin
βββ spark-jars/ # necessary spark bundlesThis guide sets up a local Apache Iceberg environment using Docker Compose. It includes Spark, Nessie (for catalog/versioning), MinIO (as S3-compatible storage), dremio, zeppelin, spark with jupyter notebook.
Make sure you have the following installed:
- Docker
- Docker Compose
- Git
- Python
git clone git@github.com:riju18/apache-iceberg-kickstart.gitcd apache-iceberg-kickstart- download via this link
- and place it into root dir
docker compose up
or,
docker compose up -d-
This will start:
- Zeppelin:
localhost:8090 - Nessie:
localhost:19120 - MinIO:
localhost:9001 - dremio:
localhost:9047 - jupyterlab:
localhost:8888
- Zeppelin:
-
minio will come up with 4 preinitialized buckets.
- datalake
- datalakehouse
- seed
- warehouse
-
Log in using:
- Username:
admin - Password:
password
- Username: