Skip to content

ateneva/dbt-data-transformations

Repository files navigation

Project Setup


This project sets up the data modelling and day-to-day-operations of theLook e-commerce DWH leveraging:

  • dbt-core

  • BigQuery

  • Cloud Composer

  • Google Cloud Provider for Terraform


Data Modelling Principles & Guidelines

The DWH transformations of theLook e-commerce data were architected under the following principles and guidelines

Be Analyst Friendly

  • Analysts shouldn't have to do multiple joins to retrieve meaningful data

Be Subject-Oriented

  • Tables are organized around major topics of interest, such as customers, products, orders

  • Each subject represents One-Big-Table with nested arrays and structs

    • child objects should never be orphans
    • child objects will always be queried within the context of the parent object

Be Relevant

  • Data should reflect how current underlying platform functions

  • Data should reflect the topics of interest to business

Be Cost Efficient

  • Only process pieces of information that have changed

  • Avoid scanning too much data per run

Be Easy to Maintain

  • Backfilling historical data should be possible via the scheduled run without the need for extra code adjustments

  • Changes in data should be easy to trace and audit

Avoid complex dependencies

  • Processing by topic instead of monolitic schedules of all topics together

Enforcing Code Quality

The following linters are in place

  • SQL linting with custom configuration for .sqlfluff

  • YAML linting with custom configuration for .yamllint

  • Python linting with default configuration via pylint

  • Markdown linting with default configuration with pymarkdownlint

SQL Linting

To see if your SQL is compliant to the defined standard, you can run the following commands

# lint a specific file
sqlfluff lint path/to/file.sql

# lint a file directory
sqlfluff lint directory/of/sql/files

# let the linter fix your code
sqlfluff fix folder/model.sql

YAML Linting

# check which files will be linted by default
yamllint --list-files .

# lint a specific file
yamllint my_file.yml

# OR
yamllint .

Pre-commit have been set up in this repo to check and fix for:

  • missing lines at the end
  • trailing whitespaces
  • violations of sql standards
  • errors in yaml syntax

dbt pre-commit hooks have been set up to check that:

Hence, when working with the repo, make sure you've got the pre-commit installed so that they run upon your every commit

# install the githook scripts
pre-commit install

# run against all existing files
pre-commit run --all-files

Setting up Local Testing Environments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published