Skip to content

Feature Request: Add container-friendly pre-setup utility for PIDGIN dependencies #76

@naglemi

Description

@naglemi

Problem

Currently, PIDGIN scoring functions require runtime conda environment creation and model downloads (~11GB), which fails in containerized environments where:

  1. Conda environment creation fails due to read-only filesystems or restricted permissions
  2. Runtime model downloads are problematic for reproducibility and security
  3. First-use delays of several minutes due to environment setup and downloads

Current Behavior

pidgin = PIDGIN(prefix="test", uniprot="P21918")
# Attempts to:
# 1. Run `conda env create -f environment.yml` 
# 2. Download 11GB models from Zenodo
# 3. Launch Flask server

This fails in containers with:

Command 'conda env create -f environment.yml' returned non-zero exit status 1

Proposed Solution

Add a pre-setup utility that container builds can use:

Option 1: CLI utility

molscore-setup pidgin --download-models --create-env

Option 2: Python utility

from molscore.setup import setup_pidgin
setup_pidgin(download_models=True, create_env=True)

Option 3: Container build helper

# In Dockerfile/Singularity definition
RUN python -m molscore.setup pidgin

Implementation Details

The utility should:

  1. Create conda environment using existing environment.yml:

    conda env create -f /opt/conda/.../molscore/data/models/PIDGINv5/environment.yml
  2. Pre-download models to ~/.pidgin_data/:

    # Trigger model download without server launch
    # Use existing zenodo-client dependency
  3. Validate setup:

    # Test that PIDGIN can instantiate without runtime setup

Benefits

  • Container-friendly: No runtime conda operations needed
  • Reproducible: Models baked into container at build time
  • Faster startup: No first-use delays
  • Offline capable: No runtime network dependencies
  • Security: No runtime downloads from external sources

Current Workaround

Users must manually:

  1. Create conda environment using the provided environment.yml
  2. Somehow trigger model downloads (unclear how)
  3. Hope the server-based architecture works in containers

Use Case

This is critical for HPC/cloud deployments using Singularity/Docker containers where PIDGIN scoring is needed for molecular optimization workflows.

Would appreciate guidance on the preferred approach!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions