Spatial Augmentations

Graph augmentation strategies for self-supervised pretraining of graph neural networks in spatial omics.

Description

Spatial Augmentations is a research framework for exploring and benchmarking graph-based data augmentation strategies to improve Graph Neural Network (GNN) pretraining on spatial omics datasets (e.g., spatial transcriptomics, spatial proteomics). This project is built on PyTorch Lightning and Hydra for modularity, reproducibility, and ease of experimentation.

The key method used is Bootstrapped Graph Latents (BGRL) (Thakoor et al., 2021) for self-supervised pretraining of GNNs via graph augmentations. Augmentation strategies are evaluated on two representative tasks: domain identification and phenotype prediction.

This repository is associated with the ETH Zürich semester project:
“Exploring Augmentation-Driven Inductive Biases for Pretraining Graph Neural Networks in Spatial Omics”
Student: Michel Tarnow
Supervisors: Lovro Rabuzin, Prof. Dr. Valentina Boeva

Project Structure

├── .github                   <- Github Actions workflows
├── configs                   <- Hydra configs
├── data                      <- Project data
├── logs                      <- Logs generated by hydra and lightning loggers
├── notebooks                 <- Jupyter notebooks
├── scripts                   <- Shell scripts
├── src                       <- Source code
├── tests                     <- Tests of any kind
│
├── .gitignore                <- List of files ignored by git
├── .pre-commit-config.yaml   <- Configuration of pre-commit hooks for code formatting
├── .project-root             <- File for inferring the position of project root directory
├── environment.yaml          <- File for installing conda environment
├── LICENSE                   <- License file
├── Makefile                  <- Makefile with commands like `make train` or `make test`
├── pyproject.toml            <- Configuration options for testing and linting
├── requirements.txt          <- File for installing python dependencies
└── README.md

Data Access

Datasets are not included in this repository due to their size. Raw datasets can be downloaded from the resources provided below.

Dataset	Task	Description
Domain123	Domain Identification	Datasets 1, 2, and 3 from Schaub et al. (2025): mouse brain spatial transcriptomics datasets (MERFISH, STARmap, BaristaSeq)
Domain4	Domain Identification	Datasets 4 from Schaub et al. (2025): mouse brain spatial transcriptomics dataset (Xenium)
Domain7	Domain Identification	Dataset 7 from Schaub et al. (2025): mouse brain spatial transcriptomics dataset (MERFISH)
NSCLC	Phenotype Prediction	non-small cell lung cancer (NSCLC) spatial proteomics dataset (IMC) from Cords et al. (2024)

Installation

Conda (Recommended)

# clone project
git clone https://github.com/BoevaLab/spatial-augmentations.git
cd spatial-augmentations

# create conda environment and install dependencies
conda env create -f environment.yaml -n myenv

# activate conda environment
conda activate myenv

Pip

# clone project
git clone https://github.com/BoevaLab/spatial-augmentations.git
cd spatial-augmentations

# [OPTIONAL] create conda environment
conda create -n myenv python=3.9
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

How to run

Train model with default configuration (here, domain identification model):

# train on CPU
python src/train_domain.py trainer=cpu

# train on GPU
python src/train_domain.py trainer=gpu

Train model with chosen experiment configuration from configs/experiment/:

python src/train_domain.py experiment=experiment_name.yaml

You can override any parameter from command line like this:

python src/train_domain.py trainer.max_epochs=20 data.batch_size=64

License

Distributed under the MIT License. See LICENSE for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatial Augmentations

Description

Project Structure

Data Access

Installation

Conda (Recommended)

Pip

How to run

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.github		.github
configs		configs
data		data
logs		logs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

michelta00/spatial-augmentations

Folders and files

Latest commit

History

Repository files navigation

Spatial Augmentations

Description

Project Structure

Data Access

Installation

Conda (Recommended)

Pip

How to run

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages