Official Implementation
Zihan Yu, Jingtao Ding, Yong Li
[Read the Paper (Free View)] | [Official Page] | [Author Page]
This repository contains the code and data for the experiments in the paper "Discovering network dynamics with neural symbolic regression", published in Nature Computational Science (2025).
Abstract: Network dynamics are fundamental to analyzing the properties of high-dimensional complex systems and understanding their behavior. Despite the accumulation of observational data across many domains, mathematical models exist in only a few areas with clear underlying principles. Here we show that a neural symbolic regression approach can bridge this gap by automatically deriving formulas from data. Our method reduces searches on high-dimensional networks to equivalent one-dimensional systems and uses pretrained neural networks to guide accurate formula discovery. Applied to ten benchmark systems, it recovers the correct forms and parameters of underlying dynamics. In two empirical natural systems, it corrects existing models of gene regulation and microbial communities, reducing prediction error by 59.98% and 55.94%, respectively. In epidemic transmission across human mobility networks of various scales, it discovers dynamics that exhibit the same power-law distribution of node correlations across scales and reveal country-level differences in intervention effects. These results demonstrate that machine-driven discovery of network dynamics can enhance understandings of complex systems and advance the development of complexity science.
The code is written in Python and PyTorch. It has been tested on Ubuntu 18.04.6 LTS with the following specifications:
- Python: 3.12.2
- PyTorch: 2.4.0+cu121
- GPU: Tested on NVIDIA RTX 4070 (Code supports CUDA acceleration)
Dependencies: The code requires the following Python packages:
torch==2.4.0+cu121
numpy==1.26.4
sympy==1.12
pandas==2.2.2
scipy==1.13.0
scikit-learn==1.5.0
matplotlib==3.8.4
networkx==3.3
tqdm==4.66.4
setproctitle==1.3.3
PyYAML==6.0.1
ipython==8.22.2
lxml==5.3.0
geopandas==1.0.1
To avoid conflicts, we recommend using a virtual environment:
Using venv:
python3 -m venv env && source env/bin/activateOr using Conda:
conda create -p ./env -y python=3.12.2 && conda activate ./envInstall the required packages in editable mode:
pip install -e .We offer a pre-trained neural network model weight. Download it from the Releases Page and save it to ./weights/checkpoint.pth.
Linux:
wget -O weights/checkpoint.pth https://github.com/yuzhTHU/ND2/releases/download/checkpoint.pth/checkpoint.pthMacOS:
curl -L -o weights/checkpoint.pth https://github.com/yuzhTHU/ND2/releases/download/checkpoint.pth/checkpoint.pthUnder normal network conditions, the installation will take approximately 4 minutes.
We provide a demo script to run the pre-trained model on a synthetic dataset (Kuramoto dynamics). To run the demo, simply execute:
python ./search.pyThis script will load the pre-trained model and search for the symbolic expression that best fits the synthetic dataset ./data/synthetic/KUR.json.
Results: The search will take approximately 1 minute on a single NVIDIA RTX 4070 GPU. After the search is complete, the script will output the best symbolic expression found and its error:
As shown in the figure, the discovered formula is
This corresponds exactly to the Kuramoto dynamics:
by mapping
A: You should save your data in a JSON file with the following structure:
{
// A_ij is the link relationship (1-link, 0-no link) between node i and node j.
"A": [[A_11, A_12, ..., A_1n], ..., [A_n1, A_n2, ..., A_nn]],
// G_in_k, G_out_k are the indices (start from 0) of the nodes connected by the k-th edge.
"G": [[G_in_1, G_out_1], [G_in_2, G_out_2], ..., [G_in_m, G_out_m]],
// x_tn is the input data (e.g., state) on n-th node at time step t
"X": [[x_11, x_12, ..., x_1n], ..., [x_t1, x_t2, ..., x_tn]],
// y_tn is the output data (e.g., state derivative) on n-th node at time step t
"Y": [[y_11, y_12, ..., y_1n], ..., [y_t1, y_t2, ..., y_tn]],
}
A: Adjacency matrix.G: Edge lists.X: Input data (state).Y: Output data (derivative).
Notes:
- If you have multiple input data, save them as
"X1","X2", ... (max 5). - If you have edge-level input data (such as edge weights), save them as
"E"(or"E1","E2", ..., max 5):
{
// ...
// e_tm is the input data (e.g., weight) on m-th edge at time step t.
// Note: The order of edges in "E" must be consistent with "G".
"E": [[e_11, e_12, ..., e_1m], ..., [e_t1, e_t2, ..., e_tm]]
// ...
}Then, follow the demo in search.py to load & feed your data into the Rewarder, NDformer, and MCTS, and simply run MCTS.fit() to search for the symbolic expression.
A: You can use the simulate.py script, which generates a synthetic dataset based on a YAML configuration file. It saves both the generated data and the plotted figure to your specified path.
Specifically, you can run
python simulate.py --conf ./config/synthetic.yaml --save ./data/synthetic/ --plot ./plot/synthetic/It will traverse each {key}:{config} pair in the ./config/synthetic.yaml configuration file, simulate the node dynamics of the {key} system according to the {config}, and save the result to ./data/synthetic/{key}.json and ./plot/synthetic/{key}.png.
The configuration file specifies the network used for simulation, simulation duration and step size, as well as the initialization and update process of independent and dependent variables in the system. Specifically:
network: Specify the network to use, which can be:{V: 10, E: 40, direction: False}: An undirected ER network with{V}nodes and{E}edges{use: ./data/networks/NorthernEuropeanGrid.csv}: Use existing networks
N: Number of steps to simulatedt: Simulation step sizevariables: Specify the independent and dependent variables in systemindependent: The independent variables, which will be initialized before the first simulation step and updated at the end of each step. For example:x: {type: node, initialize: "np.random.uniform(-5, 5, (1, V))", update: "x + dx * dt"}:xis a node-level variable, which is initialized using a uniform distribution from -5 to 5, and is updated by addingdx * dt. HereVanddtis the number of nodes and the time step,dxis a dependent variable needed to be defined below.omega0: {type: node, initialize: "np.random.uniform(-5, 5, (1, V))"}:omega0is a node-level parameter, which is initialized but never be updated.w: {type: edge, initialize: "np.random.uniform(-5, 5, (1, E))"}:wis a edge-level parameter (e.g., edge weights), which is initialized but never be updated, whereEis the number of edges.
dependent: The dependent variables, which will be calculated in each simulation step based on the values of independent variables. For example:dx: {type: node, GD_expr: "omega0 + 1.0*aggr(sin(sour(x, G, A)-targ(x, G, A)), G, A)"}is the time derivative of the Kuramoto dynamics, which is calculated based on the node-level parametersomega0, node-level variablesx, and network structureG,A. TheGD_expr(orGDExpr) here means Graph Dynamics Expressions.
plot: Indicates which variables in the system need to be plotted, for example:[x]: plot thet - x(t)figure[x, y]: plot thex(t) - y(t)figure[x, y, z]: plot thex(t) - y(t) - z(t)3D figure[sin(x)]: plot thet - sin(x(t))figure, which is useful when you plot the Kuramoto dynamics
plot: Variables to visualize (e.g.,[x],[x, y],[sin(x)]).
If you find this code useful in your research, please consider citing our paper:
BibTeX:
@article{Yu2025Discovering,
title={Discovering network dynamics with neural symbolic regression},
author={Yu, Zihan and Ding, Jingtao and Li, Yong},
journal={Nature Computational Science},
year={2025},
month={oct},
day={23},
doi={10.1038/s43588-025-00893-8},
url={[https://doi.org/10.1038/s43588-025-00893-8](https://doi.org/10.1038/s43588-025-00893-8)}
}
Text:
Yu, Z., Ding, J. & Li, Y. Discovering network dynamics with neural symbolic regression. Nat Comput Sci (2025). https://doi.org/10.1038/s43588-025-00893-8
