CAAT is an end-to-end pipeline designed to bridge the gap between protein structure prediction and functional biological insight. It automates the generation of AlphaFold2 structures and performs deep-dive analysis on the raw attention heads to identify residues of high structural importance to the model.
The quickest way to get up and running with CAAT is to use the publicly available notebook with the GPU runtime. You can find that here.
View coverage report here.
Follow these steps to set up and run your local version of ColabFold using Poetry.
You need Python 3.11 and Poetry installed on your system.
- Python: Install Python 3.11.
- Poetry: Install it by following the official Poetry installation guide.
This section guides you through installing the environment and executing the custom script for collecting attention heads.
To set up the project, you must first clone the repository and then install all dependencies using Poetry.
-
Clone the Repository: Get the project source code from the remote repository.
git clone https://github.com/prameshsharma25/CAAT.git cd CAAT -
Install Dependencies: Run the following command using Poetry.
poetry install
Install AlphaFold Dependencies:
poetry install -E alphafold
For local GPU usage or running on an HPC cluster, install additional jax[cuda] libraries:
poetry run pip install --no-warn-conflicts 'jax[cuda12]==0.4.28' jaxlib==0.4.28
CAAT offers three entry points depending on whether you need a full structural run or just specific analysis components.
Option A: Full End-to-End Run
poetry run python scripts/run_e2e_pipeline.py [OPTIONS]Option B: Generate Attention Heads Only
If you only require attention heads for your own analysis, run the following script:
poetry run python scripts/run_attention_heads.py [OPTIONS]Additionally, attention heads can be retrieved from CAAT in the same way as ColabFold with an additional custom flag for outputting attention heads locally:
poetry run colabfold_batch [OPTIONS] --output-attention-dir 'PATH/TO/HEADS'Option C: Run Analysis Only
If you already have attention head .npy files and simply need to generate the plots and difference maps:
poetry run python scripts/run_analysis_pipeline.py [OPTIONS]This directory contains the standard structural outputs from the AlphaFold engine.
This folder contains the raw numerical matrices extracted during the model's forward pass.
This is the primary directory for human-readable insights.
- The mean attention score for each amino acid residue across all layers and heads.
- The residue-by-residue delta between the Query and the Target.
- Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S and Steinegger M. ColabFold: Making protein folding accessible to all.
Nature Methods (2022) doi: 10.1038/s41592-022-01488-1 - If you’re using AlphaFold, please also cite:
Jumper et al. "Highly accurate protein structure prediction with AlphaFold."
Nature (2021) doi: 10.1038/s41586-021-03819-2 - If you’re using AlphaFold-multimer, please also cite:
Evans et al. "Protein complex prediction with AlphaFold-Multimer."
biorxiv (2021) doi: 10.1101/2021.10.04.463034v1 - If you are using RoseTTAFold, please also cite:
Minkyung et al. "Accurate prediction of protein structures and interactions using a three-track neural network."
Science (2021) doi: 10.1126/science.abj8754