This is the official repository of "GraphPFN: A Prior-Data Fitted Graph Foundation Model" paper (arXiv). In this repository, we provide code for reproducing our experiments with GraphPFN, both pretraining and evaluation.
Please note that we use third-party components (specifically, TabICL and LimiX) in our code with some modifications. Please see NOTICE file and LICENSES/ directory for details. Also, LimiX serves as a backbone for GraphPFN, and LimiX weights have their own license, please check out LimiX repository for details.
Prerequisites
- Install uv
- Install dependencies
uv sync
- For experiments on GraphLand, download datasets and place them in "data" directory
Running the evaluation
You can execute a minimal evaluation run with a following command:
uv run bin/go.py exp/graphpfn-eval/finetune/raw/tolokers-2/tuning.toml --force
Running the pretraining
First, you will need to generate graphs and store them in data/graphpfn-graphs, check bin/prior/README.md for details.
Then, to run GraphPFN pretraining you can use the following command:
DGLBACKEND=pytorch uv run -m torch.distributed.run --nproc-per-node 8 bin/graphpfn_pretrain.py exp/graphpfn-pretrain/pretrain.toml
bin/- Training and evaluation scriptsexp/- Experiment configurations and resultsdata/- Dataset directory (created after download)lib/- Common utilities and tools
Experiments are configured using TOML files located in the exp/ directory. Each configuration specifies:
- Dataset path and preprocessing
- Model hyperparameters
- Training settings
- Evaluation metrics
Evaluation results are saved in the same directory as the configuration file:
report.json- Evaluation metrics- Model checkpoints
- Training logs