This project was created as a take-home coding assessment for Oxalis. It consists of an end-to-end data pipeline with the following components:
- Import sales data from
example_sales_data.csvinto Python. - Clean and validate the sales data, standardizing the format and filling in missing data where possible.
- Load the sales data into a PostgreSQL database.
- Create dbt models transforming and then aggregating the data to produce potentially useful insights.
For a full list of requirements, see requirements.txt and pyproject.toml in this directory.
- Install all necessary dependencies.
- Clone this repository.
- Run
bash run.shfrom within this top-level directory to build container and models. - Open
dbt/oxalis_challenge/target/static_index.htmlin your browser to interactively view data models and lineage graph.
Below is an incomplete list of features that I would add to this project if I expected to deliver it to clients for their use.
- Write additional explanatory documentation, especially installation and usage details.
- Automatically generate a dummy sales data file so that the user can clone the repository and run the entire container without already having the required
example_sales_data.csvfile. - Automatically test all components of the Python code using (e.g.) a
pytestframework. - Add inline comments and complete docstrings to all Python scripts (in a manner similar to my personal toolbox, "gconanpy").
- Modify
oxalis_challenge/data_cleaners.pyto handle multiple transactions per date. - Modify
oxalis_challenge/psql_loaders.pyto allow incremental loading.
| column_name | data_type |
|---|---|
| quantity | bigint |
| transaction_id | bigint |
| discount | double |
| date | timestamp |
| price | double |
| store_id | bigint |
| region | text |
| product_name | text |
| customer_type | text |
| payment_method | text |
.
├── Dockerfile
├── README.md
├── app.py
├── data/
│ └── example_sales_data.csv
├── dbt/
│ └── oxalis_challenge/
│ ├── README.md
│ ├── analyses
│ ├── dbt_packages
│ ├── dbt_project.yml
│ ├── macros
│ ├── models
│ │ ├── intermediate
│ │ │ ├── int_ledger__transactions.sql
│ │ │ └── properties.yml
│ │ ├── marts
│ │ │ ├── dim_ledger__product.sql
│ │ │ ├── dim_ledger__store.sql
│ │ │ ├── fct_ledger__transactions.sql
│ │ │ └── properties.yml
│ │ ├── report
│ │ │ ├── rpt_ledger__revenue.sql
│ │ │ └── rpt_ledger__sales_qty.sql
│ │ ├── sources.yml
│ │ └── staging
│ │ ├── properties.yml
│ │ └── stg_ledger__transactions.sql
│ ├── profiles.yml
│ │ ├── run_results.json
│ │ └── semantic_manifest.json
│ └── tests
├── docker-compose.yml
├── oxalis_challenge
│ ├── __init__.py
│ ├── cli.py
│ ├── config.py
│ ├── data_cleaners.py
│ └── psql_loaders.py
├── poetry.lock
├── pyproject.toml
├── requirements.txt
└── run.sh