Important
We are no longer actively maintaining this repository. All active work by the Allen Institute for Cell Science is located under the AllenCell organization.
This project uses Poetry to manage dependencies and virtual environments.
- Create the virtual environment:
poetry install- Activate the environment:
poetry shellThis project also includes a requirements.txt generated from the poetry.lock file.
Install dependencies directly from this file using:
pip install -r requirements.txtInstall the package (note that you need pip ≥ 21.3):
pip install -e .The pipeline uses Prefect for workflows and Hydra for composable configuration. Workflows can be run via CLI or through the Prefect UI as deployments.
When running via CLI, configurations can be passed in three ways: inline, using a single config file, or using composable config files. Note that Hydra supports additional overriding configurations via CLI for all three options.
Configurations can be passed directly using:
abmpipe demo :: parameters.name=demo_parameters context.name=demo_context series.name=demo_seriesCreate a config file demo.yaml with the following contents:
context:
name: demo_context
series:
name: demo_series
parameters:
name: demo_parametersThen use:
abmpipe demo /path/to/demo.yamlCreate a configs directory with the following structure:
configs
├── context
│ └── demo.yaml
├── parameters
│ └── demo.yaml
└── series
└── demo.yamlEach demo.yaml should contain the field name: <name>.
Then use:
abmpipe demo parameters=demo context=demo series=demoUse the flag --dryrun to display the composed configuration without running the workflow.
Use the flag --deploy to create a Prefect deployment.
Configs can use Secret fields.
In configs, any field in the form ${secret:name-of-secret} will be resolved using the Prefect Secret loader.
These values must be configured as a Secret Block in Prefect via a script:
from prefect.blocks.system import Secret
Secret(value="secret-value").save(name="name-of-secret")or in the Prefect UI under Blocks.
New flows can be added to the flows module with following structure:
from dataclasses import dataclass
from prefect import flow
@dataclass
class ParametersConfig:
# TODO: add parameter config
@dataclass
class ContextConfig:
# TODO: add context config
@dataclass
class SeriesConfig:
# TODO: add series config
@flow(name="name-of-flow")
def run_flow(context: ContextConfig, series: SeriesConfig, parameters: ParametersConfig) -> None:
# TODO: add flowThe command:
abmpipe name-of-flowwill create a new flow template under the flows module with the name name_of_flow.
Notebooks can be helpful for prototyping flows.
Create dataclasses for all relevant configuration for the flow. Specify types and default values, if relevant. For flows in this repo, three types of configs are used:
ParametersConfigspecifies all parameters for the flowContextConfigspecifies the infrastructure context (e.g. local working path or S3 bucket names)SeriesConfigspecifies the simulation series the flow is applied to (e.g. simulation name, conditions, seeds)
Configurations can be loaded in multiple ways.
- Load entire configuration directly from an existing configuration file using the
make_config_from_filefunction. Works best for simple configurations without interpolation.
config = make_config_from_file(ConfigDataclass, f"/path/to/config.yaml")- Load partial configuration directly from an existing configuration file using the
make_config_from_filefunction. Missing fields in the config can be loaded from other configuration files usingOmegaConf.loador set directly. Works best for configurations that use interpolation.
config = make_config_from_file(ConfigDataclass, f"/path/to/config.yaml")
config.field = OmegaConf.load(f"/path/to/another/config.yaml").field
config.field = "value"- Directly instantiate the config object.
Fields in object initialization can also be loaded using
OmegaConf.load. Works best for custom configurations or testing configurations.
config = ConfigDataclass(
field="value",
field=OmegaConf.load(f"/path/to/config.yaml").field,
...
)Import tasks from collections in the undecorated form:
from collection.module.task import taskTasks can also be imported in decorated form:
from collection.module import taskbut will need to called using task.fn() because we are not in a Prefect flow environment.
Make sure the main flow method has the @flow decorator and imports should be switched to their @task decorated form to take advantage of Prefect task and flow monitoring.