Skip to content

riju18/apache-iceberg-kickstart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

apache-iceberg-kickstart

🧊 Apache Iceberg Exploration

Welcome to my personal journey exploring Apache Iceberg, an open table format for large-scale analytics datasets. This repository tracks my experiments, findings, setup steps, and integration with other tools like Spark, Nessie, MinIO, zeppelin and dremio.


πŸ“š What is Apache Iceberg?

Apache Iceberg is an open table format designed for huge analytic datasets. It brings SQL table-like features to data lakes: ACID transactions, schema evolution, time travel, partition evolution, and hidden partitioning.

Iceberg doc


πŸ› οΈ My Setup

πŸ”§ Technologies Used

  • Apache Spark (Data processing)
  • Apache Iceberg (table format)
  • Project Nessie (catalog service for versioned data)
  • MinIO (S3-compatible object storage)
  • Docker Compose (for local orchestration)
  • Zeppelin (interactive notebooks)
  • dremio (Lakehouse platform)

πŸ“ Repo Structure

.
β”œβ”€β”€ docker-compose.yml/               # Docker Compose setup
β”œβ”€β”€ zeppelin_notebooks/               # JupyterLab / Zeppelin notebooks
β”œβ”€β”€ zeppelin_conf/                    # zeppelin interpretor conf
β”œβ”€β”€ spark/                            # spark bin
β”œβ”€β”€ spark-jars/                       # necessary spark bundles

βš™οΈ Installation Guide

This guide sets up a local Apache Iceberg environment using Docker Compose. It includes Spark, Nessie (for catalog/versioning), MinIO (as S3-compatible storage), dremio, zeppelin, spark with jupyter notebook.


🧩 Prerequisites

Make sure you have the following installed:

  • Docker
  • Docker Compose
  • Git
  • Python

πŸ“ Clone the Repository

git clone git@github.com:riju18/apache-iceberg-kickstart.git
cd apache-iceberg-kickstart

Download spark

  • download via this link
  • and place it into root dir

🐳 Start the Docker Environment

docker compose up

or,

docker compose up -d
  • This will start:

    • Zeppelin: localhost:8090
    • Nessie: localhost:19120
    • MinIO: localhost:9001
    • dremio: localhost:9047
    • jupyterlab: localhost:8888

πŸ—ƒοΈ Initialize MinIO Buckets

  • minio will come up with 4 preinitialized buckets.

    • datalake
    • datalakehouse
    • seed
    • warehouse
  • Log in using:

    • Username: admin
    • Password: password

Releases

No releases published

Packages

No packages published