This repo contains several items necessary to reproduce the paper "Data-driven fine-grained region discovery in the mouse brain with transformers". Note that this README is intended as a rough summary and particularly for model information please see the documentation website.
CellTransformer was trained on a machine with 128GB RAM and 2 NVIDIA A6000 GPUs, running Ubuntu 22.04. Training memory usage did not exceed 64 GB RAM and did not require a full 48 GB of GPU memory on the A6000 cards; however we strongly reccomend using a system with at least this amount of RAM and a 24 GB GPU (ex. RTX 3090, 4090 or A5000 and above).
On our system performant results can be generated in as few as 10 epochs (~2.5 hrs per epoch on 1 GPU),
-
the package
celltransformerwhich is pip installable- we also provide a Dockerfile
- installation will require a couple minutes on a normal server assuming you have CUDA installed (see NVIDIA documentation for this); you will require a GPU for training and very likely inference
-
mkdocsautomatic documentation which is hosted at github.com/abbasilab/celltransformer- to build the docs, pip install
docs/requirements.txt - the items in the documentation include:
- a walkthrough of how to use the package to train a transformer model using our objective on the AIBS MERFISH data -- configuration is limited to setting paths to the data using
hydraconfig files (a brief tutorial is provided), and we provide a minimal example of how to do so - descriptions of the model data requirements (ie what attention matrix formats, how the data is passed to the model itself)
- non-technical descriptions (as well as some code contextualization descriptions) of the dataloader and the format that is output by it, including both single-sample
__getitem__level logic and logic to collate the single neighborhoods across batches at the PyTorch dataloader level - automatic documentation from the code itself assembled as an API reference; critical items (models, loaders) are extensively documented and type-annotated
- a walkthrough of how to use the package to train a transformer model using our objective on the AIBS MERFISH data -- configuration is limited to setting paths to the data using
- to build the docs, pip install
-
scripts and ipynb files to perform basic analyses
- scripts are a minimal
argparseinterface over code to get embeddings from a trained model and dataset and to cluster them using thecumlwe used in the paper - ipynb files principally describing the workflows after the model is generated
- again, clustering the data after embeddings and visualization (because the number of clusters can be large, sometimes it is slightly nontrivial to visualize them), ex:

- smoothing the embeddings on the spatial graph (see paper)
- counting the number of different single-cell types (from reference atlas) in a spatial cluster (code can also be used to do so for CCF regions, whatever class labeling etc.)
- again, clustering the data after embeddings and visualization (because the number of clusters can be large, sometimes it is slightly nontrivial to visualize them), ex:
- scripts are a minimal
Finally, see the notebook in notebooks/demo_celltransformer_onesection.ipynb for a minimal example of using our trained model to get embeddings and clusters for a single section (section 52); notebook includes "one-cell" commands to download all data to your CoLab environment.
Clone this repo and pip install, or run:
pip install git+https://github.com/abbasilab/celltransformer
As a last option you can run and build the Dockerfile, which includes all necessary software (e.g. CUDA).
-
all data used in this repository was publicly available from the Allen Brain Cell dataset. See https://alleninstitute.github.io/abc_atlas_access/intro.html for more information.
-
model weights for the model trained on Allen 1 (the Allen Institute for Brain Science component of the Allen Brain Cell dataset) can be found here: https://huggingface.co/datasets/alxlee/celltransformer_materials under
model_weights.pth- see the CoLab notebook for parameter settings, but for reference the settings used to instantiate this model are:
model = celltransformer.model.CellTransformer(encoder_depth=4, encoder_embedding_dim=384, decoder_embedding_dim=384, decoder_depth=4, encoder_num_heads=8, decoder_num_heads=8, n_genes=500, cell_cardinality=384, bias=True )
If this is useful to you, please consider citing:
@ARTICLE{Lee2024-bh,
title = "Data-driven fine-grained region discovery in the mouse brain with
transformers",
author = "Lee, Alex J and Dubuc, Alma and Kunst, Michael and Yao, Shenqin
and Lusk, Nicholas and Ng, Lydia and Zeng, Hongkui and Tasic,
Bosiljka and Abbasi-Asl, Reza",
journal = "bioRxivorg",
pages = "2024.05.05.592608",
month = feb,
year = 2025,
language = "en"
}