Provides a dockerised environment and an initial database schema and data to play around with data build tool and Apache Airflow.
A postgres database container is created by the docker compose file.
Database setup is performed on startup by applying database migrations to create the schema and populate it with faked data.
Liquibase is used as the database migration tool and is created as a docker container that runs against the database, conditional on the database reaching a healthy state.
dbt provides functionality to transform data for analysis or consumption. It is generally used for the Transform part of ELT pipelines.
Apache Airflow provides batch workflow functionality including scheduling and monitoring. Extract and Load pipelines can be written and run in Airflow to provide the source data to be Transformed by dbt.
mise-en-place dev tool
The repo contains a .mise.toml file for use with mise-en-place
mise cli can be installed in multiple ways, summarised here
e.g. with homebrew:
brew install miseAfter installation, you can configure your shell to automatically activate mise when you cd into a directory containing a .mise.toml file by adding the following to your shell configuration file (e.g. ~/.zshrc or ~/.bashrc):
echo 'eval "$(mise activate zsh)"' >> ~/.zshrcAfter installation, you can install the required runtimes, create and activate a python virtual environment, and install the required python packages by running:
mise installThe first time you run mise in the directory, you will be promted to trust the .mise.toml file. This will allow mise to automatically activate the virtual environment when you cd into the directory.
Create a .env file in the project root directory based on the .env_example
Generate a (connection parameter compatible) secure password:
openssl rand -base64 48 | tr -dc 'a-zA-Z0-9' | head -c 64Setup the docker environment by first building the containers contained in the docker-compose.yml file.
docker compose buildStartup the docker environmement to create the database, run the initial liquibase migrations and setup Airflow.
docker compose up -ddbt commands can be run within the dbt_playground directory to debug, test and run models.
Create the profile for dbt to use to connect to the source database.
If you haven't installed dbt previously, create a .dbt folder in your home directory and create a profiles.yml file:
cd ~
mkdir .dbt
cd .dbt
touch profiles.ymlAdd a profile for the dbt-playground to the profiles.yml file:
dbt_playground:
outputs:
dev:
dbname: <DB_NAME from your .env file>
host: localhost
pass: <DB_PASSWORD from your .env file>
port: 5432
schema: dbt
threads: 1
type: postgres
user: <DB_USER from your .env file>
target: devFrom the learn-dbt project directory, change into the dbt_playground directory:
cd dbt_playgroundTest the setup and database connection by running:
dbt debugInstall the dbt dependencies:
dbt depsLoad the seed data:
dbt seedRun the dbt models:
dbt runBuild the documentation:
dbt docs generateView the documentation (when Airflow isn't running due to port conflict):
dbt docs serveSome example usage
When the docker environment is running, the Airflow UI can be accessed at: http://localhost:8080
Sign in with username: airflow, password: airflow
Create a database connection for the example ons_postcode_pipeline.py to use:
- Navigate to Admin > Connections
- Add a new connection of type postgres
- Enter the connection_id as "postgres_dag_connection" to match the one set in the ons_postcode_pipeline.py file
- Use the connection parameters from the project .env file