Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 26 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,29 @@
# `Sparse denoising diffusion for large graph generation`

Official code for the paper, "Sparse Training of Discrete Diffusion Models for Graph Generation," available [here](https://arxiv.org/abs/2311.02142).

Checkpoints to reproduce the results can be found at [this link](https://drive.switch.ch/index.php/s/1hHNVCb0ylbYPoQ). Please refer to the updated version of our paper on arXiv.


## Environment installation
This code was tested with PyTorch 2.0.1, cuda 11.8 and torch_geometrics 2.3.1

- Download anaconda/miniconda if needed
- Create a rdkit environment that directly contains rdkit:

```conda create -c conda-forge -n sparse rdkit=2023.03.2 python=3.9```
- `conda activate sparse`
- Check that this line does not return an error:

``` python3 -c 'from rdkit import Chem' ```
- Install graph-tool (https://graph-tool.skewed.de/):

```conda install -c conda-forge graph-tool=2.45```
- Check that this line does not return an error:

```python3 -c 'import graph_tool as gt' ```
- Install the nvcc drivers for your cuda version. For example:

```conda install -c "nvidia/label/cuda-11.8.0" cuda```
- Install a corresponding version of pytorch, for example:

```pip3 install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118```
- Install other packages using the requirement file:

```pip install -r requirements.txt```
- Install mini-moses:

```pip install git+https://github.com/igor-krawczuk/mini-moses```
- Run:

```pip install -e .```

- Navigate to the ./sparse_diffusion/analysis/orca directory and compile orca.cpp:

```g++ -O2 -std=c++11 -o orca orca.cpp```


## Run the code

# Sparse denoising diffusion for large graph generation
Forked from the official code for the paper, "Sparse Training of Discrete Diffusion Models for Graph Generation," available [here](https://arxiv.org/abs/2311.02142).
Checkpoints to reproduce the results can be found at [this link](https://drive.switch.ch/index.php/s/1hHNVCb0ylbYPoQ).
Please refer to the updated version [here](https://arxiv.org/abs/2311.02142).

## Environment installation (Modified from README.md of [SparseDiff](https://github.com/vincenttsai2015/SparseDiff/blob/main/README.md))
This code was tested with PyTorch 2.4.1, cuda 12.1 and torch_geometrics 2.4.0
* Download anaconda/miniconda if needed
* Conda environment building: ```conda create -c conda-forge -n digress rdkit=2023.03.2 python=3.9```
* Activate the environment: ```conda activate digress```
* Install graph-tool: ```conda install -c conda-forge graph-tool=2.45```
* Verify the installation:
* ```python3 -c 'from rdkit import Chem'```
* ```python3 -c 'import graph_tool as gt'```
* Install the nvcc drivers: ```conda install -c "nvidia/label/cuda-12.1.0" cuda```
* Install Pytorch: ```(python -m) pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121```
* Install PyG related packages: ```(python -m) pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu121.html```
* Install DGL (for SparseDiff): ```conda install -c dglteam/label/th24_cu121 dgl```
* Please ensure the synchronization of the versions of *nvcc drivers, Pytorch, PyG, and DGL*!
* Install the rest packages: ```pip install -r requirements.txt```
* Install mini-moses (optional): ```pip install git+https://github.com/igor-krawczuk/mini-moses```
* Navigate to the directory ```./sparse_diffusion/analysis/orca``` and compile orca.cpp: ```g++ -O2 -std=c++11 -o orca orca.cpp```

## Main execution file usage
* Use config files in folder ```config/experiments```.
* Example command for execution: ```CUDA_VISIBLE_DEVICES=0 python main.py +experiment=ego.yaml```
- All code is currently launched through `python3 main.py`. Check hydra documentation (https://hydra.cc/) for overriding default parameters.
- To run the debugging code: `python3 main.py +experiment=debug.yaml`. We advise to try to run the debug mode first
before launching full experiments.
Expand All @@ -64,8 +43,5 @@ This code was tested with PyTorch 2.0.1, cuda 11.8 and torch_geometrics 2.3.1
}
```

<!-- If you have retrained a model from scratch for which the samples are not available yet, we would be very happy if you could send them to us! -->

## Troubleshooting

`PermissionError: [Errno 13] Permission denied: 'SparseDiff/sparse_diffusion/analysis/orca/orca'`: You probably did not compile orca.
2 changes: 1 addition & 1 deletion configs/dataset/sbm.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'sbm'
datadir: 'data/sbm'
datadir: 'data/sbm/'
remove_h: null
molecules: False
spectre: True
Expand Down
9 changes: 8 additions & 1 deletion configs/experiment/ego.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,15 @@ general:
final_model_samples_to_generate: 151
final_model_samples_to_save: 30
final_model_chains_to_save: 10
dataset:
name: 'ego'
datadir: 'data/ego/'
random_subset: null
pin_memory: False
molecules: False
spectre: False
train:
n_epochs: 100000
n_epochs: 100
batch_size: 32
save_model: True
num_workers: 0
Expand Down
28 changes: 14 additions & 14 deletions configs/experiment/nx_graphs.yaml
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
# @package _global_

general:
# General settings for ggg benchmarks
check_val_every_n_epochs: 5
sample_every_val: 2
samples_to_generate: 16 # since these are benchmarking graphs, we can use a smaller number
samples_to_save: 16
chains_to_save: 1
log_every_steps: 50
number_chain_steps: 50 # Number of frames in each gif
# General settings for ggg benchmarks
check_val_every_n_epochs: 5
sample_every_val: 2
samples_to_generate: 16 # since these are benchmarking graphs, we can use a smaller number
samples_to_save: 16
chains_to_save: 1
log_every_steps: 50
number_chain_steps: 50 # Number of frames in each gif

final_model_samples_to_generate: 10000
final_model_samples_to_save: 100
final_model_chains_to_save: 50
cpus_per_gpu: 4
force_ray: false
val_bs_multiplier: 1.0
final_model_samples_to_generate: 100
final_model_samples_to_save: 100
final_model_chains_to_save: 50
cpus_per_gpu: 4
force_ray: false
val_bs_multiplier: 1.0
8 changes: 7 additions & 1 deletion configs/experiment/planar.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,14 @@ general:
final_model_samples_to_generate: 40
final_model_samples_to_save: 30
final_model_chains_to_save: 20
dataset:
name: 'planar'
datadir: 'data/planar/'
remove_h: null
molecules: False
spectre: True
train:
n_epochs: 300000
n_epochs: 30
batch_size: 64
save_model: True
model:
Expand Down
9 changes: 8 additions & 1 deletion configs/experiment/sbm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,15 @@ general:
final_model_samples_to_generate: 40
final_model_samples_to_save: 30
final_model_chains_to_save: 20
dataset:
name: 'sbm'
datadir: 'data/sbm/'
remove_h: null
molecules: False
spectre: True
pin_memory: False
train:
n_epochs: 200000
n_epochs: 20
batch_size: 32
save_model: True
num_workers: 0
Expand Down
41 changes: 41 additions & 0 deletions configs/experiment/wiki-vote.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# @package _global_
general:
name : 'wiki-vote'
num_bins: 27500
gpus : 1
wandb: 'online'
resume: null # If resume, path to ckpt file from outputs directory in main directory
test_only: null
check_val_every_n_epochs: 1000
sample_every_val: 4
samples_to_generate: 64
samples_to_save: 9
chains_to_save: 1
final_model_samples_to_generate: 151
final_model_samples_to_save: 30
final_model_chains_to_save: 10
dataset:
name: 'wiki-vote'
datadir: 'data/wiki-vote/'
random_subset: null
pin_memory: False
molecules: False
spectre: False
train:
n_epochs: 100
batch_size: 32
save_model: True
num_workers: 0
model:
diffusion_steps: 1000
n_layers: 8
num_degree: 20
lambda_train: [5, 0, 2]
extra_features: 'all'
edge_fraction: 0.1
# Do not set hidden_mlp_E, dim_ffE too high, computing large tensors on the edges is costly
# At the moment (03/08), y contains quite little information
hidden_mlp_dims: { 'X': 128, 'E': 64, 'y': 128 }
# The dimensions should satisfy dx % n_head == 0
hidden_dims: { 'dx': 256, 'de': 64, 'dy': 128, 'n_head': 8, 'dim_ffX': 256, 'dim_ffE': 64, 'dim_ffy': 256 }
pin_memory: False
36 changes: 18 additions & 18 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
dgl
hydra-core==1.3.2
imageio==2.31.1
matplotlib==3.7.1
networkx==2.8.7
numpy==1.23
omegaconf==2.3.0
overrides==7.3.1
pandas==1.4
pyemd==1.0.0
PyGSP==0.5.1
scipy==1.11.0
pytorch_lightning==2.0.4
setuptools==68.0.0
torch_geometric==2.3.1
torchmetrics==0.11.4
tqdm==4.65.0
wandb==0.15.4
hydra-core
imageio
matplotlib
networkx
numpy
omegaconf
overrides
pandas
pyemd
PyGSP
pytorch_lightning
scipy
setuptools
torchmetrics
tqdm
wandb
networkx-temporal
torch-geometric==2.4.0
8 changes: 6 additions & 2 deletions sparse_diffusion/analysis/visualization.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
import os
import os, sys
import pathlib
import os.path as osp
RootPath = pathlib.Path(osp.realpath(__file__)).parents[1]
sys.path.append(f'{RootPath}')

from rdkit import Chem
from rdkit.Chem import Draw, AllChem
Expand All @@ -10,7 +14,7 @@
import rdkit.Chem
import wandb
import matplotlib.pyplot as plt
from sparse_diffusion.metrics.molecular_metrics import Molecule, SparseMolecule
from metrics.molecular_metrics import Molecule, SparseMolecule


class Visualizer:
Expand Down
12 changes: 8 additions & 4 deletions sparse_diffusion/datasets/abstract_dataset.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
import abc
import numpy as np

from sparse_diffusion.diffusion.distributions import DistributionNodes
import sparse_diffusion.utils as utils
import os, sys
import pathlib
import os.path as osp
RootPath = pathlib.Path(osp.realpath(__file__)).parents[1]
sys.path.append(f'{RootPath}')
from diffusion.distributions import DistributionNodes
import utils
import torch
import torch.nn.functional as F
from torch_geometric.data.lightning import LightningDataset
Expand All @@ -26,7 +30,6 @@ def __init__(self, cfg, datasets):

def dataset_stat(self):
dataset = self.train_dataset + self.val_dataset + self.test_dataset

nodes = []
edges = []
sparsity = []
Expand Down Expand Up @@ -176,6 +179,7 @@ def complete_infos(self, statistics, node_types):

def compute_input_dims(self, datamodule, extra_features, domain_features):
data = next(iter(datamodule.train_dataloader()))
print(data)
example_batch = self.to_one_hot(data)
ex_dense, node_mask = utils.to_dense(
example_batch.x,
Expand Down
16 changes: 9 additions & 7 deletions sparse_diffusion/datasets/guacamol_dataset.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import os
import os.path as osp
import os, sys
import pathlib
import os.path as osp
RootPath = pathlib.Path(osp.realpath(__file__)).parents[1]
sys.path.append(f'{RootPath}')

import hashlib
import numpy as np
Expand All @@ -12,19 +14,19 @@
import torch.nn.functional as F
from torch_geometric.data import InMemoryDataset, download_url

from sparse_diffusion.utils import PlaceHolder
from sparse_diffusion.datasets.abstract_dataset import (
from utils import PlaceHolder
from datasets.abstract_dataset import (
MolecularDataModule,
AbstractDatasetInfos,
)
from sparse_diffusion.datasets.dataset_utils import (
from datasets.dataset_utils import (
save_pickle,
mol_to_torch_geometric,
load_pickle,
Statistics,
)
from sparse_diffusion.metrics.molecular_metrics import SparseMolecule
from sparse_diffusion.metrics.metrics_utils import compute_all_statistics
from metrics.molecular_metrics import SparseMolecule
from metrics.metrics_utils import compute_all_statistics


TRAIN_HASH = "05ad85d871958a05c02ab51a4fde8530"
Expand Down
18 changes: 9 additions & 9 deletions sparse_diffusion/datasets/moses_dataset.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import os
import os.path as osp
import os, sys
import pathlib


import os.path as osp
RootPath = pathlib.Path(osp.realpath(__file__)).parents[1]
sys.path.append(f'{RootPath}')
import torch
import torch.nn.functional as F
from rdkit import Chem, RDLogger
Expand All @@ -12,19 +12,19 @@
from torch_geometric.data import InMemoryDataset, download_url
from hydra.utils import get_original_cwd

from sparse_diffusion.utils import PlaceHolder
from sparse_diffusion.datasets.abstract_dataset import (
from utils import PlaceHolder
from datasets.abstract_dataset import (
MolecularDataModule,
AbstractDatasetInfos,
)
from sparse_diffusion.datasets.dataset_utils import (
from datasets.dataset_utils import (
save_pickle,
mol_to_torch_geometric,
load_pickle,
Statistics,
)
from sparse_diffusion.metrics.molecular_metrics import SparseMolecule
from sparse_diffusion.metrics.metrics_utils import compute_all_statistics
from metrics.molecular_metrics import SparseMolecule
from metrics.metrics_utils import compute_all_statistics


atom_encoder = {"C": 0, "N": 1, "S": 2, "O": 3, "F": 4, "Cl": 5, "Br": 6}
Expand Down
Loading