-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
Problem
Currently, PIDGIN scoring functions require runtime conda environment creation and model downloads (~11GB), which fails in containerized environments where:
- Conda environment creation fails due to read-only filesystems or restricted permissions
- Runtime model downloads are problematic for reproducibility and security
- First-use delays of several minutes due to environment setup and downloads
Current Behavior
pidgin = PIDGIN(prefix="test", uniprot="P21918")
# Attempts to:
# 1. Run `conda env create -f environment.yml`
# 2. Download 11GB models from Zenodo
# 3. Launch Flask serverThis fails in containers with:
Command 'conda env create -f environment.yml' returned non-zero exit status 1
Proposed Solution
Add a pre-setup utility that container builds can use:
Option 1: CLI utility
molscore-setup pidgin --download-models --create-envOption 2: Python utility
from molscore.setup import setup_pidgin
setup_pidgin(download_models=True, create_env=True)Option 3: Container build helper
# In Dockerfile/Singularity definition
RUN python -m molscore.setup pidginImplementation Details
The utility should:
-
Create conda environment using existing
environment.yml:conda env create -f /opt/conda/.../molscore/data/models/PIDGINv5/environment.yml
-
Pre-download models to
~/.pidgin_data/:# Trigger model download without server launch # Use existing zenodo-client dependency
-
Validate setup:
# Test that PIDGIN can instantiate without runtime setup
Benefits
- ✅ Container-friendly: No runtime conda operations needed
- ✅ Reproducible: Models baked into container at build time
- ✅ Faster startup: No first-use delays
- ✅ Offline capable: No runtime network dependencies
- ✅ Security: No runtime downloads from external sources
Current Workaround
Users must manually:
- Create conda environment using the provided
environment.yml - Somehow trigger model downloads (unclear how)
- Hope the server-based architecture works in containers
Use Case
This is critical for HPC/cloud deployments using Singularity/Docker containers where PIDGIN scoring is needed for molecular optimization workflows.
Would appreciate guidance on the preferred approach!
Metadata
Metadata
Assignees
Labels
No labels