Skip to content

Official implementation of the paper: "DeFoG: Discrete Flow Matching for Graph Generation"

License

Notifications You must be signed in to change notification settings

manuelmlmadeira/DeFoG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeFoG: Discrete Flow Matching for Graph Generation

A PyTorch implementation of the DeFoG model for training and sampling discrete graph flows. (Please update to the latest commit. Recent fixes have been applied.)

Paper: https://arxiv.org/pdf/2410.04263

Poster: https://icml.cc/virtual/2025/poster/45644

Oral Presentation: https://icml.cc/virtual/2025/oral/47238

DeFoG: Visualization


📝 Updates

Working with directed graphs? Consider using DIRECTO, a discrete flow matching framework for directed graph generation.

For an updated development environment with modernized dependencies, see the updated_env branch. The main branch remains the reference implementation, based on Python 3.9 and older package versions.

🚀 Installation

We provide two alternative installation methods: Docker and Conda.

🐳 Docker

We provide the Dockerfile to run DeFoG in a container.

  1. Build the Docker image:
    docker build --platform=linux/amd64 -t defog-image .

⚠️ Once you clone DeFoG's git repository to your workspace, you may need to run pip install -e . to make all the repository modules visible.

🐍 Conda

  1. Install Conda (we used version 25.1.1) and create DeFoG's environment:
    conda env create -f environment.yaml
    conda activate defog
  2. Run the following commands to check if the installation of the main packages was successful:
    python -c "import sys; print('Python version:', sys.version)"
    python -c "import rdkit; print('RDKit version:', rdkit.__version__)"
    python -c "import graph_tool as gt; print('Graph-Tool version:', gt.__version__)"
    python -c "import torch; print(f'PyTorch version: {torch.__version__}, CUDA version (via PyTorch): {torch.version.cuda}')"
    python -c "import torch_geometric as tg; print('PyTorch Geometric version:', tg.__version__)"
    If you see no errors, the installation was successful and you can proceed to the next step.
  3. Compile the ORCA evaluator:
    cd src/analysis/orca
    g++ -O2 -std=c++11 -o orca orca.cpp

⚠️ Tested on Ubuntu.


⚙️ Usage

All commands use python main.py with Hydra overrides. Note that main.py is inside the src directory.

Quick start

Use this script to quickly test the code.

python main.py +experiment=debug

Full training

python main.py +experiment=<dataset> dataset=<dataset>
  • QM9 (no H): +experiment=qm9_no_h dataset=qm9
  • Planar: +experiment=planar dataset=planar
  • SBM: +experiment=sbm dataset=sbm
  • Tree: +experiment=tree dataset=tree
  • Comm20: +experiment=comm20 dataset=comm20
  • Guacamol: +experiment=guacamol dataset=guacamol
  • MOSES: +experiment=moses dataset=moses
  • QM9 (with H): +experiment=qm9_with_h dataset=qm9
  • TLS (conditional): +experiment=tls dataset=tls
  • ZINC: +experiment=zinc dataset=zinc

📊 Evaluation

Sampling from DeFoG is typically done in two steps:

  1. Sampling Optimization → find best sampling configuration
  2. Final Sampling → sample and measure performance under the best configuration

To perform 5 runs (mean ± std), set general.num_sample_fold=5.

For the rest of this section, we take Planar dataset as an example:

Default sampling

python main.py +experiment=planar dataset=planar general.test_only=<path/to/checkpoint> sample.eta=0 sample.omega=0 sample.time_distortion=identity

Note that if you run:

python main.py +experiment=planar dataset=planar general.test_only=<path/to/checkpoint> 

it will run with the sampling parameters (η, ω, sample distortion) that we obtained after sampling optimization (see next section) and are reported in the paper.

Sampling optimization

To search over the optimal inference hyperperameters (η, ω, distortion), use the sample.search flag, which will save a csv file with the results.

  • Non-grid search (independent search for each component):
    python main.py +experiment=planar dataset=planar general.test_only=<path/to/checkpoint> sample.search=all
  • Component-wise: set sample.search=target_guidance | distortion | stochasticity above.

⚠️ We set the default search intervals for each sampling parameter as we used in our experiments. You may want to adjust these intervals according to your needs.

Final sampling

Use optimal η, ω, time distortion resulting from the search:

python main.py +experiment=planar dataset=planar general.test_only=<path/to/checkpoint> sample.eta=<η> sample.omega=<ω> sample.time_distortion=<distortion>

🌐 Extend DeFoG to new datasets

Start by creating a new file in the src/datasets directory. You can refer to the following scripts as examples:

  • spectre_dataset.py, if you are using unattributed graphs;
  • tls_dataset.py, if you are using graphs with attributed nodes;
  • qm9_dataset.py or guacamol_dataset.py, if you are using graphs with attributed nodes and edges (e.g., molecular data).

This new file should define a Dataset class to handle data processing (refer to the PyG documentation for guidance), as well as a DatasetInfos class to specify relevant dataset properties (e.g., number of nodes, edges, etc.).

Once your dataset file is ready, update main.py to incorporate the new dataset. Additionally, you can add a corresponding file in the configs/dataset directory.

Finally, if you are planning to introduce custom metrics, you can create a new file under the metrics directory.


Checkpoints

Checkpoints, along with their corresponding results and generated samples, are shared here.

To run sampling and evaluate generation with a given checkpoint, set the general.test_only flag to the path of the checkpoint file (.ckpt file). To skip sampling and directly evaluate previously generated samples, set the flag general.generated_path to the path of the generated samples (.pkl file).

(Note: The released checkpoints are retrained models from the public repository. Their performance is consistent with the paper’s findings, with minor variations attributable to training/sampling stochasticity.)


📌 Upon request

  • protein / EGO datasets
  • FCD score for molecules
  • W&B sweeps for sampling optimization

🙏 Acknowledgements


📚 Citation

@inproceedings{qinmadeira2024defog,
  title     = {DeFoG: Discrete Flow Matching for Graph Generation},
  author    = {Qin, Yiming and Madeira, Manuel and Thanou, Dorina and Frossard, Pascal},
  booktitle = {International Conference on Machine Learning (ICML)},
  year      = {2025},
}

About

Official implementation of the paper: "DeFoG: Discrete Flow Matching for Graph Generation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages