FlowForge is a powerful, lightweight Python workflow orchestration engine that allows you to build, manage, and execute complex data pipelines and task workflows with ease. It provides a robust foundation for creating directed acyclic graphs (DAGs) of tasks with dependency management, retry logic, multiple execution modes, and comprehensive monitoring capabilities.
FlowForge helps you:
- Define workflows as a series of interconnected tasks with dependencies
- Execute tasks in the correct order based on their dependencies
- Handle failures with configurable retry logic and error handling
- Monitor execution with comprehensive logging and metrics
- Scale execution with multiple executor types (local, threaded, multiprocess)
- Persist state for checkpoint recovery and workflow resumption
- Manage resources with context managers and cleanup utilities
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Tasks β β DAG β β Scheduler β
β βββββΆβ βββββΆβ β
β β’ Extract β β β’ Dependencies β β β’ Execution β
β β’ Transform β β β’ Validation β β β’ Concurrency β
β β’ Load β β β’ Traversal β β β’ Retry Logic β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Executors β β Events β β Persistence β
β β β β β β
β β’ Local ββββββ€ β’ Logging β β β’ Checkpoints β
β β’ Threaded β β β’ Metrics β β β’ State Mgmt β
β β’ Process-based β β β’ Notifications β β β’ Recovery β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
FlowForge/
βββ core/ # Core engine components
β βββ task.py # Task abstraction and execution
β βββ dag.py # Directed Acyclic Graph management
β βββ scheduler.py # Task scheduling and coordination
β βββ executor.py # Multiple execution backends
β βββ event.py # Event system for monitoring
β βββ context.py # Resource management
β βββ exceptions.py # Custom exception classes
βββ dsl/ # Domain-Specific Language
β βββ workflow_dsl.py # Declarative workflow definition
βββ utils/ # Utility modules
β βββ logging_utils.py # Logging and output management
β βββ retry_utils.py # Retry and backoff strategies
β βββ metrics_utils.py # Performance monitoring
βββ state/ # State management
β βββ persistence.py # Checkpoint and recovery
βββ examples/ # Sample workflows
β βββ sample_workflow.py # Comprehensive demo
β βββ etl_workflow.py # Data pipeline example
β βββ ml_workflow.py # ML pipeline example
βββ tests/ # Unit tests
β βββ test_task.py
β βββ test_dag.py
β βββ test_scheduler.py
β βββ test_executor.py
βββ cli.py # Command-line interface
βββ requirements.txt # Dependencies
βββ setup.py # Package configuration
βββ README.md # This file
First, create and activate a virtual environment:
# Create virtual environment
python -m venv venv
# Activate on Windows
venv\Scripts\activate
# Activate on macOS/Linux
source venv/bin/activatepip install -r requirements.txt# Run the sample workflow
python cli.py run examples/sample_workflow.py
# Or run with threading
python cli.py run examples/sample_workflow.py --executor thread --workers 4FlowForge provides a comprehensive command-line interface for managing workflows:
# List all tasks and dependencies
python cli.py list-tasks examples/sample_workflow.py
# Validate workflow structure
python cli.py validate examples/sample_workflow.py
# Run workflow (default: local executor)
python cli.py run examples/sample_workflow.py
# Run with specific executor and workers
python cli.py run examples/sample_workflow.py --executor thread --workers 4
# Dry run (show structure without execution)
python cli.py run examples/sample_workflow.py --dry-run
# Run with verbose logging
python cli.py run examples/sample_workflow.py --verbose
# Run with minimal output
python cli.py run examples/sample_workflow.py --quiet# Run with checkpointing
python cli.py run examples/sample_workflow.py --checkpoint my_state.json
# Check workflow status from checkpoint
python cli.py status --checkpoint my_state.json| Executor | Description | Use Case |
|---|---|---|
local |
Sequential execution | Simple workflows, debugging |
thread |
Multi-threaded execution | I/O bound tasks |
process |
Multi-process execution | CPU intensive tasks |
from core.task import Task
from core.dag import DAG
from core.scheduler import Scheduler
# Define task functions
def extract():
print("Extracting data...")
return "raw_data"
def transform():
print("Transforming data...")
return "clean_data"
def load():
print("Loading data...")
return "loaded"
# Create tasks
t1 = Task("extract", func=extract)
t2 = Task("transform", func=transform, max_retries=2)
t3 = Task("load", func=load)
# Set dependencies
t2.add_dependency(t1) # transform depends on extract
t3.add_dependency(t2) # load depends on transform
# Create DAG and add tasks
dag = DAG()
for task in [t1, t2, t3]:
dag.add_task(task)
# Run workflow
scheduler = Scheduler(dag, executor_type="local")
scheduler.run()from dsl.workflow_dsl import workflow, task
with workflow("My_ETL_Pipeline") as wf:
@task
def extract():
print("Extracting data...")
return "raw_data"
@task(max_retries=2)
def transform():
print("Transforming data...")
return "clean_data"
@task
def load():
print("Loading data...")
return "loaded"
# Set dependencies
wf.dag.get_task("transform").add_dependency(wf.dag.get_task("extract"))
wf.dag.get_task("load").add_dependency(wf.dag.get_task("transform"))
# Run the workflow
wf.run()from core.event import EventManager, WorkflowEvent
from utils.logging_utils import log_task_started, log_task_succeeded
# Set up event management
events = EventManager()
events.register(WorkflowEvent.TASK_STARTED, log_task_started)
events.register(WorkflowEvent.TASK_SUCCEEDED, log_task_succeeded)
# Use with scheduler
scheduler = Scheduler(dag, events=events)from utils.metrics_utils import MetricsManager
metrics = MetricsManager()
# Register with events for automatic collection
events.register(WorkflowEvent.TASK_STARTED, metrics.task_started)
events.register(WorkflowEvent.TASK_SUCCEEDED, metrics.task_succeeded)
# Print metrics after execution
metrics.print_task_metrics()
metrics.print_workflow_metrics()from core.context import with_context, file_context
@with_context(lambda: file_context("output.txt", "w"))
def write_results(file_handle):
file_handle.write("Results here")
return "written"
task = Task("write_task", func=write_results)from state.persistence import PersistenceManager
# Set up persistence
pm = PersistenceManager("workflow_state.json")
# Resume from checkpoint if available
pm.resume(dag)
# Attach autosave to events
pm.attach_to_events(events)task = Task(
name="my_task", # Unique task identifier
func=my_function, # Function to execute
max_retries=3, # Number of retry attempts
timeout=60, # Task timeout in seconds
depends_on=[] # List of dependency tasks
)| State | Description |
|---|---|
PENDING |
Task is waiting to be executed |
RUNNING |
Task is currently executing |
SUCCESS |
Task completed successfully |
FAILED |
Task failed after all retries |
Run the comprehensive test suite:
# Run all tests
python -m pytest tests/
# Run specific test file
python -m pytest tests/test_task.py -v
# Run with coverage
python -m pytest tests/ --cov=core --cov-report=htmltest_task.py- Task creation, execution, and state managementtest_dag.py- DAG construction, validation, and traversaltest_scheduler.py- Workflow coordination and execution ordertest_executor.py- Different execution backends
# examples/etl_workflow.py - Simple ETL pipeline
python cli.py run examples/etl_workflow.py# examples/ml_workflow.py - ML training pipeline
python cli.py run examples/ml_workflow.py --executor process --workers 2# examples/sample_workflow.py - Comprehensive demo with all features
python cli.py run examples/sample_workflow.py --verbose- Import Errors: Ensure you're in the project root and virtual environment is activated
- File Not Found: Use correct relative paths (e.g.,
examples/sample_workflow.py) - Circular Dependencies: Use
python cli.py validateto check for cycles - Task Failures: Check logs and increase retry counts if needed
# Enable verbose logging
python cli.py run examples/sample_workflow.py --verbose
# Dry run to check structure
python cli.py run examples/sample_workflow.py --dry-run
# Validate before running
python cli.py validate examples/sample_workflow.pyCreate your own executor by extending BaseExecutor:
from core.executor import BaseExecutor
class CustomExecutor(BaseExecutor):
def execute(self, task):
# Your custom execution logic
return task.run()def my_custom_handler(task=None, **kwargs):
print(f"Custom handling for task: {task.name}")
events.register(WorkflowEvent.TASK_STARTED, my_custom_handler)from utils.retry_utils import retry_task
@retry_task(max_retries=3, backoff_factor=2.0)
def unreliable_function():
# Function that might fail
pass-
Choose the Right Executor:
- Use
localfor simple debugging - Use
threadfor I/O-bound tasks - Use
processfor CPU-intensive tasks
- Use
-
Optimize Task Dependencies:
- Minimize unnecessary dependencies
- Parallelize independent tasks
- Use
--dry-runto visualize execution graph
-
Configure Retries Appropriately:
- Set
max_retriesbased on task reliability - Use exponential backoff for network operations
- Set
-
Use Checkpointing for Long Workflows:
- Enable persistence for workflows > 30 minutes
- Resume failed workflows from last checkpoint
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and add tests
- Run tests:
python -m pytest tests/ - Submit a pull request
# Clone your fork
git clone https://github.com/your-username/flowforge.git
cd flowforge
# Create development environment
python -m venv dev-env
source dev-env/bin/activate # or dev-env\Scripts\activate on Windows
# Install development dependencies
pip install -r requirements.txt
pip install pytest pytest-cov black flake8
# Run tests
python -m pytest tests/ -vThis project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: This README contains comprehensive usage information
- Issues: Report bugs or request features via GitHub Issues
- Examples: Check the
examples/directory for sample workflows - Tests: Review
tests/directory for usage patterns
- Web-based dashboard for workflow monitoring
- Additional executor types (Docker, Kubernetes)
- Workflow templates and marketplace
- Integration with external services (AWS, GCP)
- Advanced scheduling (cron-like, triggers)
- Workflow versioning and rollback capabilities
Happy Orchestrating! πΌ
For more information, examples, or support, please refer to the comprehensive examples in the examples/ directory or run:
python cli.py --help