This project sets up the data modelling and day-to-day-operations of theLook e-commerce DWH leveraging:
-
dbt-core
-
BigQuery
-
Cloud Composer
-
Google Cloud Provider for Terraform
The DWH transformations of theLook e-commerce data were architected under the following principles and guidelines
- Analysts shouldn't have to do multiple joins to retrieve meaningful data
-
Tables are organized around major topics of interest, such as customers, products, orders
-
Each subject represents One-Big-Table with nested arrays and structs
- child objects should never be orphans
- child objects will always be queried within the context of the parent object
-
Data should reflect how current underlying platform functions
-
Data should reflect the topics of interest to business
-
Only process pieces of information that have changed
-
Avoid scanning too much data per run
-
Backfilling historical data should be possible via the scheduled run without the need for extra code adjustments
-
Changes in data should be easy to trace and audit
- Processing by topic instead of monolitic schedules of all topics together
The following linters are in place
-
SQL linting with custom configuration for
.sqlfluff -
YAML linting with custom configuration for
.yamllint -
Python linting with default configuration via
pylint -
Markdown linting with default configuration with
pymarkdownlint
To see if your SQL is compliant to the defined standard, you can run the following commands
# lint a specific file
sqlfluff lint path/to/file.sql
# lint a file directory
sqlfluff lint directory/of/sql/files
# let the linter fix your code
sqlfluff fix folder/model.sql- SQL linting (and fixing) is enforced via pre-commit hooks for
sqlfluff
# check which files will be linted by default
yamllint --list-files .
# lint a specific file
yamllint my_file.yml
# OR
yamllint .Pre-commit have been set up in this repo to check and fix for:
- missing lines at the end
- trailing whitespaces
- violations of sql standards
- errors in yaml syntax
dbt pre-commit hooks have been set up to check that:
-
there are no compilation errors
-
no semi-colons have been forgotten at the end of sql queries
Hence, when working with the repo, make sure you've got the pre-commit installed so that they run upon your every commit
# install the githook scripts
pre-commit install
# run against all existing files
pre-commit run --all-files