qym7 · Tang-Webber · Nov 18, 2025 · Nov 18, 2025 · Nov 19, 2025 · Nov 19, 2025
diff --git a/README.md b/README.md
@@ -1,50 +1,29 @@
-# `Sparse denoising diffusion for large graph generation`
-
-Official code for the paper, "Sparse Training of Discrete Diffusion Models for Graph Generation," available [here](https://arxiv.org/abs/2311.02142).
-
-Checkpoints to reproduce the results can be found at [this link](https://drive.switch.ch/index.php/s/1hHNVCb0ylbYPoQ). Please refer to the updated version of our paper on arXiv.
-
-
-## Environment installation
-This code was tested with PyTorch 2.0.1, cuda 11.8 and torch_geometrics 2.3.1
-
-  - Download anaconda/miniconda if needed
-  - Create a rdkit environment that directly contains rdkit:
-
-    ```conda create -c conda-forge -n sparse rdkit=2023.03.2 python=3.9```
-  - `conda activate sparse`
-  - Check that this line does not return an error:
-
-    ``` python3 -c 'from rdkit import Chem' ```
-  - Install graph-tool (https://graph-tool.skewed.de/):  
-
-    ```conda install -c conda-forge graph-tool=2.45```
-  - Check that this line does not return an error:
-
-    ```python3 -c 'import graph_tool as gt' ```
-  - Install the nvcc drivers for your cuda version. For example:
-
-    ```conda install -c "nvidia/label/cuda-11.8.0" cuda```
-  - Install a corresponding version of pytorch, for example: 
-
-    ```pip3 install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118```
-  - Install other packages using the requirement file: 
-
-    ```pip install -r requirements.txt```
-  - Install mini-moses: 
-
-    ```pip install git+https://github.com/igor-krawczuk/mini-moses```
-  - Run:
-
-    ```pip install -e .```
-
-  - Navigate to the ./sparse_diffusion/analysis/orca directory and compile orca.cpp: 
-
-     ```g++ -O2 -std=c++11 -o orca orca.cpp```
-
-
-## Run the code
-
+# Sparse denoising diffusion for large graph generation
+Forked from the official code for the paper, "Sparse Training of Discrete Diffusion Models for Graph Generation," available [here](https://arxiv.org/abs/2311.02142).
+Checkpoints to reproduce the results can be found at [this link](https://drive.switch.ch/index.php/s/1hHNVCb0ylbYPoQ). 
+Please refer to the updated version [here](https://arxiv.org/abs/2311.02142).
+
+## Environment installation (Modified from README.md of [SparseDiff](https://github.com/vincenttsai2015/SparseDiff/blob/main/README.md))
+This code was tested with PyTorch 2.4.1, cuda 12.1 and torch_geometrics 2.4.0
+* Download anaconda/miniconda if needed
+* Conda environment building: ```conda create -c conda-forge -n digress rdkit=2023.03.2 python=3.9```
+* Activate the environment: ```conda activate digress```
+* Install graph-tool: ```conda install -c conda-forge graph-tool=2.45```
+* Verify the installation:
+  * ```python3 -c 'from rdkit import Chem'```
+  * ```python3 -c 'import graph_tool as gt'```
+* Install the nvcc drivers: ```conda install -c "nvidia/label/cuda-12.1.0" cuda```
+* Install Pytorch: ```(python -m) pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121```
+* Install PyG related packages: ```(python -m) pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu121.html```
+* Install DGL (for SparseDiff): ```conda install -c dglteam/label/th24_cu121 dgl```
+* Please ensure the synchronization of the versions of *nvcc drivers, Pytorch, PyG, and DGL*!
+* Install the rest packages: ```pip install -r requirements.txt```
+* Install mini-moses (optional): ```pip install git+https://github.com/igor-krawczuk/mini-moses```
+* Navigate to the directory ```./sparse_diffusion/analysis/orca``` and compile orca.cpp: ```g++ -O2 -std=c++11 -o orca orca.cpp```
+
+## Main execution file usage
+* Use config files in folder ```config/experiments```.
+* Example command for execution: ```CUDA_VISIBLE_DEVICES=0 python main.py +experiment=ego.yaml```
   - All code is currently launched through `python3 main.py`. Check hydra documentation (https://hydra.cc/) for overriding default parameters.
   - To run the debugging code: `python3 main.py +experiment=debug.yaml`. We advise to try to run the debug mode first
     before launching full experiments.
@@ -64,8 +43,5 @@ This code was tested with PyTorch 2.0.1, cuda 11.8 and torch_geometrics 2.3.1
 }
 ```
 
-<!-- If you have retrained a model from scratch for which the samples are not available yet, we would be very happy if you could send them to us! -->
-
 ## Troubleshooting 
-
 `PermissionError: [Errno 13] Permission denied: 'SparseDiff/sparse_diffusion/analysis/orca/orca'`: You probably did not compile orca.
diff --git a/configs/dataset/sbm.yaml b/configs/dataset/sbm.yaml
@@ -1,5 +1,5 @@
 name: 'sbm'
-datadir: 'data/sbm'
+datadir: 'data/sbm/'
 remove_h: null
 molecules: False
 spectre: True

diff --git a/configs/experiment/ego.yaml b/configs/experiment/ego.yaml
@@ -13,8 +13,15 @@ general:
     final_model_samples_to_generate: 151
     final_model_samples_to_save: 30
     final_model_chains_to_save: 10
+dataset:
+    name: 'ego'
+    datadir: 'data/ego/'
+    random_subset: null
+    pin_memory: False
+    molecules: False
+    spectre: False
 train:
-    n_epochs: 100000
+    n_epochs: 100
     batch_size: 32
     save_model: True
     num_workers: 0

diff --git a/configs/experiment/nx_graphs.yaml b/configs/experiment/nx_graphs.yaml
@@ -1,18 +1,18 @@
 # @package _global_
 
 general:
-  # General settings for ggg benchmarks
-  check_val_every_n_epochs: 5
-  sample_every_val: 2
-  samples_to_generate: 16       # since these are benchmarking graphs, we can use a smaller number
-  samples_to_save: 16
-  chains_to_save: 1
-  log_every_steps: 50
-  number_chain_steps: 50        # Number of frames in each gif
+    # General settings for ggg benchmarks
+    check_val_every_n_epochs: 5
+    sample_every_val: 2
+    samples_to_generate: 16       # since these are benchmarking graphs, we can use a smaller number
+    samples_to_save: 16
+    chains_to_save: 1
+    log_every_steps: 50
+    number_chain_steps: 50        # Number of frames in each gif
 
-  final_model_samples_to_generate: 10000
-  final_model_samples_to_save: 100
-  final_model_chains_to_save: 50
-  cpus_per_gpu: 4
-  force_ray: false
-  val_bs_multiplier: 1.0
+    final_model_samples_to_generate: 100
+    final_model_samples_to_save: 100
+    final_model_chains_to_save: 50
+    cpus_per_gpu: 4
+    force_ray: false
+    val_bs_multiplier: 1.0
diff --git a/configs/experiment/planar.yaml b/configs/experiment/planar.yaml
@@ -12,8 +12,14 @@ general:
     final_model_samples_to_generate: 40
     final_model_samples_to_save: 30
     final_model_chains_to_save: 20
+dataset:
+    name: 'planar'
+    datadir: 'data/planar/'
+    remove_h: null
+    molecules: False
+    spectre: True
 train:
-    n_epochs: 300000
+    n_epochs: 30
     batch_size: 64
     save_model: True
 model:

diff --git a/configs/experiment/sbm.yaml b/configs/experiment/sbm.yaml
@@ -13,8 +13,15 @@ general:
     final_model_samples_to_generate: 40
     final_model_samples_to_save: 30
     final_model_chains_to_save: 20
+dataset:
+    name: 'sbm'
+    datadir: 'data/sbm/'
+    remove_h: null
+    molecules: False
+    spectre: True
+    pin_memory: False
 train:
-    n_epochs: 200000
+    n_epochs: 20
     batch_size: 32
     save_model: True
     num_workers: 0

diff --git a/configs/experiment/wiki-vote.yaml b/configs/experiment/wiki-vote.yaml
@@ -0,0 +1,41 @@
+# @package _global_
+general:
+    name : 'wiki-vote'
+    num_bins: 27500
+    gpus : 1
+    wandb: 'online'
+    resume: null            # If resume, path to ckpt file from outputs directory in main directory
+    test_only: null
+    check_val_every_n_epochs: 1000
+    sample_every_val: 4
+    samples_to_generate: 64
+    samples_to_save: 9
+    chains_to_save: 1
+    final_model_samples_to_generate: 151
+    final_model_samples_to_save: 30
+    final_model_chains_to_save: 10
+dataset:
+    name: 'wiki-vote'
+    datadir: 'data/wiki-vote/'
+    random_subset: null
+    pin_memory: False
+    molecules: False
+    spectre: False
+train:
+    n_epochs: 100
+    batch_size: 32
+    save_model: True
+    num_workers: 0
+model:
+    diffusion_steps: 1000
+    n_layers: 8
+    num_degree: 20
+    lambda_train: [5, 0, 2]
+    extra_features: 'all'
+    edge_fraction: 0.1
+    # Do not set hidden_mlp_E, dim_ffE too high, computing large tensors on the edges is costly
+    # At the moment (03/08), y contains quite little information
+    hidden_mlp_dims: { 'X': 128, 'E': 64, 'y': 128 }
+    # The dimensions should satisfy dx % n_head == 0
+    hidden_dims: { 'dx': 256, 'de': 64, 'dy': 128, 'n_head': 8, 'dim_ffX': 256, 'dim_ffE': 64, 'dim_ffy': 256 }
+    pin_memory: False
diff --git a/requirements.txt b/requirements.txt
@@ -1,18 +1,18 @@
-dgl
-hydra-core==1.3.2
-imageio==2.31.1
-matplotlib==3.7.1
-networkx==2.8.7
-numpy==1.23
-omegaconf==2.3.0
-overrides==7.3.1
-pandas==1.4
-pyemd==1.0.0
-PyGSP==0.5.1
-scipy==1.11.0
-pytorch_lightning==2.0.4
-setuptools==68.0.0
-torch_geometric==2.3.1
-torchmetrics==0.11.4
-tqdm==4.65.0
-wandb==0.15.4
+hydra-core
+imageio
+matplotlib
+networkx
+numpy
+omegaconf
+overrides
+pandas
+pyemd
+PyGSP
+pytorch_lightning
+scipy
+setuptools
+torchmetrics
+tqdm
+wandb
+networkx-temporal
+torch-geometric==2.4.0
diff --git a/sparse_diffusion/analysis/visualization.py b/sparse_diffusion/analysis/visualization.py
@@ -1,4 +1,8 @@
-import os
+import os, sys
+import pathlib
+import os.path as osp
+RootPath = pathlib.Path(osp.realpath(__file__)).parents[1]
+sys.path.append(f'{RootPath}')
 
 from rdkit import Chem
 from rdkit.Chem import Draw, AllChem
@@ -10,7 +14,7 @@
 import rdkit.Chem
 import wandb
 import matplotlib.pyplot as plt
-from sparse_diffusion.metrics.molecular_metrics import Molecule, SparseMolecule
+from metrics.molecular_metrics import Molecule, SparseMolecule
 
 
 class Visualizer:

diff --git a/sparse_diffusion/datasets/abstract_dataset.py b/sparse_diffusion/datasets/abstract_dataset.py
@@ -1,8 +1,12 @@
 import abc
 import numpy as np
-
-from sparse_diffusion.diffusion.distributions import DistributionNodes
-import sparse_diffusion.utils as utils
+import os, sys
+import pathlib
+import os.path as osp
+RootPath = pathlib.Path(osp.realpath(__file__)).parents[1]
+sys.path.append(f'{RootPath}')
+from diffusion.distributions import DistributionNodes
+import utils
 import torch
 import torch.nn.functional as F
 from torch_geometric.data.lightning import LightningDataset
@@ -26,7 +30,6 @@ def __init__(self, cfg, datasets):
 
     def dataset_stat(self):
         dataset = self.train_dataset + self.val_dataset + self.test_dataset
-
         nodes = []
         edges = []
         sparsity = []
@@ -176,6 +179,7 @@ def complete_infos(self, statistics, node_types):
 
     def compute_input_dims(self, datamodule, extra_features, domain_features):
         data = next(iter(datamodule.train_dataloader()))
+        print(data)
         example_batch = self.to_one_hot(data)
         ex_dense, node_mask = utils.to_dense(
             example_batch.x,

diff --git a/sparse_diffusion/datasets/guacamol_dataset.py b/sparse_diffusion/datasets/guacamol_dataset.py
@@ -1,6 +1,8 @@
-import os
-import os.path as osp
+import os, sys
 import pathlib
+import os.path as osp
+RootPath = pathlib.Path(osp.realpath(__file__)).parents[1]
+sys.path.append(f'{RootPath}')
 
 import hashlib
 import numpy as np
@@ -12,19 +14,19 @@
 import torch.nn.functional as F
 from torch_geometric.data import InMemoryDataset, download_url
 
-from sparse_diffusion.utils import PlaceHolder
-from sparse_diffusion.datasets.abstract_dataset import (
+from utils import PlaceHolder
+from datasets.abstract_dataset import (
     MolecularDataModule,
     AbstractDatasetInfos,
 )
-from sparse_diffusion.datasets.dataset_utils import (
+from datasets.dataset_utils import (
     save_pickle,
     mol_to_torch_geometric,
     load_pickle,
     Statistics,
 )
-from sparse_diffusion.metrics.molecular_metrics import SparseMolecule
-from sparse_diffusion.metrics.metrics_utils import compute_all_statistics
+from metrics.molecular_metrics import SparseMolecule
+from metrics.metrics_utils import compute_all_statistics
 
 
 TRAIN_HASH = "05ad85d871958a05c02ab51a4fde8530"

diff --git a/sparse_diffusion/datasets/moses_dataset.py b/sparse_diffusion/datasets/moses_dataset.py
@@ -1,8 +1,8 @@
-import os
-import os.path as osp
+import os, sys
 import pathlib
-
-
+import os.path as osp
+RootPath = pathlib.Path(osp.realpath(__file__)).parents[1]
+sys.path.append(f'{RootPath}')
 import torch
 import torch.nn.functional as F
 from rdkit import Chem, RDLogger
@@ -12,19 +12,19 @@
 from torch_geometric.data import InMemoryDataset, download_url
 from hydra.utils import get_original_cwd
 
-from sparse_diffusion.utils import PlaceHolder
-from sparse_diffusion.datasets.abstract_dataset import (
+from utils import PlaceHolder
+from datasets.abstract_dataset import (
     MolecularDataModule,
     AbstractDatasetInfos,
 )
-from sparse_diffusion.datasets.dataset_utils import (
+from datasets.dataset_utils import (
     save_pickle,
     mol_to_torch_geometric,
     load_pickle,
     Statistics,
 )
-from sparse_diffusion.metrics.molecular_metrics import SparseMolecule
-from sparse_diffusion.metrics.metrics_utils import compute_all_statistics
+from metrics.molecular_metrics import SparseMolecule
+from metrics.metrics_utils import compute_all_statistics
 
 
 atom_encoder = {"C": 0, "N": 1, "S": 2, "O": 3, "F": 4, "Cl": 5, "Br": 6}