Learn about linking data engineering and data analytics tasks into scalable and reliable pipelines.
A workflow is a sequence of tasks executed in order with dependencies to accomplish a larger goal. These tutorials focus on workflow orchestration tools for data engineering and analytics that automate, schedule, and manage data processing pipelines—from simple ETL to complex machine learning tasks.
We distinguish:
- Workflows related to your Office Productivity tools (not covered here)
- Workflows for DevOps (CI/CD)
- Workflows for Data Engineering and Data Analytics
Clone this repository and explore the docs:
git clone https://github.com/UVADS/workflow-basics.git
cd workflow-basics- Introduction
- Orchestration
- Choosing a tool
- Monitoring
- Deployment
- Persistence
- Resilience
- Reproducibility
- Portability & Sharing
- Quickstart
- Airflow
- Nextflow
- Snakemake
- Prefect
- Dagster
- Targets
- Examples
- Airflow Examples
- Nextflow Examples
- Prefect Examples
- Targets Examples
Contributions are welcome! Please feel free to submit a Pull Request.
Distributed under the MIT License.