We are ByteDance Seed team.
You can get to know us better through the following channels👇
This is the open-source codes repository for paper It Takes Two to Entangle (ASPLOS'26). This project provides a tool to inductively infer the relations for tensors between sequential and distributed computation graphs, allowing checking the their equivalence scalably and efficiently. It supports verification for graphs dumped with our wrapped version of TorchDynamo, thus covers a wide range of PyTorch programs.
NOTE for Artifact Evaluation Reviewers: This also serves as the artifact for ASPLOS's Artifact Evaluation, targetting Availability and Evaluted badges. Please follow the instructions in Artifact Evaluation to reproduce the results in the paper.
If you are using CloudLab, we provide a CloudLab profile and a single script to automatically set up the environment.
You can find the profile here. After starting an experiment with this profile, login to the shell and run the following commands to setup (you don't need to clone the repository in advance):
wget https://raw.githubusercontent.com/nyu-systems/Entangle/refs/heads/main/setup.sh -O $HOME/setup.sh && source $HOME/setup.shThe script will
- install Rust (cargo), uv
- clone the repository to
/opt/tiger/Entangle - set up the Python environment with uv
- build and installs Entangle
For best reproducing, we recommend using the environment we used for the experiments:
- Ubuntu-22.04
- ✅️ Linux: should run (not tested)
- ✅️ MacOS: can run, but significantly slower (tested)
- ❌ Windows: cannot run due to usage of bash commands
- ✅️ WSL: can run the tool (tested)
- Python 3.12
- Rust supporting edition 2018
Other software recommendation are described in pyproject.toml and Cargo.toml, and can be installed following the Install Dependencies instructions below.
Assuming you have installed
Python>= 3.12 withpipavailable.
# 1. Install Rust (for Linux, check the link for other systems)
# You can refer to https://rust-lang.org/learn/get-started/ to install Rust, or run the command below:
# WARNING: this command may update your shell profile(~/.bashrc or ~/.zshrc) to add Rust to PATH.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# NOTE: After installing Rust, you usually need to restart your terminal to activate it.
# 2. Install Entangle
# Assume you are starting under the root directory of this repository.
# This command installs `entangle` and `torchgraph`(deprecated) python packages and `tg` and `tginfer` commands.
pip install -e .
# 3. Build egger and then go back.
cd egger && cargo build --release && cd ..Please refer to the examples/README.md for instructions on running examples.
Thank you for your reviews.
This artifact is our implementation of the inductive relation refinement method described in the paper. We also provides input computation graphs for all the open-source models/frameworks we used. But unfortunately, we cannot publish the graphs of the company's proprietary models.
With this repository, the performance results for verification graphs from open-source frameworks can be reproduced, including
- (Figure 3) End-to-end verification time across different models
- (Figure 4) Scalability on verifying parallelized models.
- (Figure 5) Lemmas complexity statistics (which requires manual counting and the results are put in the visualization script).
- (Figure 6) Heatmap showing the number of times each lemmas is used for different models.
We first introduce the Code Structure, then provide Evaluation Instructions to reproduce open-source part results in the paper.
(Note: only key components are introduced here.)
The roles of the code files/directories are as described below in the comments.
egger
├── extractor # helper to extract output relations
├── src
│ ├── load.rs # load egraph dumped from Python side
│ ├── main.rs # entry point
│ ├── metadata.rs # internal types
│ ├── model.rs # operators
│ ├── rewrite # lemmas
│ ├── special # operators and lemmas for new frameworks
│ ├── symval # symbolic value manager
entangle # the key Python package
├── convert # converts graphs between different formats
├── graph # graph dumping tools wrapping TorchDynamo
├── ops # defines ATen and customized operators
├── pgraph # graph format dumped through our wrapping.
├── sgraph # core graph format used during verification
│ ├── egraph.py # egrpah format
│ ├── infer # contains different exploration strategies.
│ ├── sexpr.py # for expressions
│ ├── sgraph.py # for the whole computional graph
│ ├── transform.py # lower distributed operators
examples # examples, see examples/README.mdSetup your environment following instructions in previous Setup.
The example input compution graphs are in examples/data. Run the commands below to run the refinement inference and verification.
These three commands run all verification task for each model sequentially. All the verified graphs are bug-free.
# Assume you are in directory `examples`
# gpt starts 16 runs sequentially (TP degrees are 2, 4, 6, 8, and numbers of layers are 1, 2, 4, 8)
python run_all.py gpt --all
# qwen2 starts 2 runs sequentially (TP degrees are 2, 4, and number of layers is 1)
python run_all.py qwen2 --all
# aws_llama starts 12 runs sequentially (TP degrees are 2, 4, 8, and numbers of layers are 1, 2, 4, 8)
python run_all.py aws_llama --all If you see Refinement verification succeeded for ... messages after each run, then it means the verification is successful.
A Note for OUTPUT directory: By default, these commands only output the real running commands, which set $OUTPUT environment variable as the directory for output logs. The output directory includes:
$OUTPUT/group*: logs for each step of inference$OUTPUT/check_impl: logs for checking user expectations$OUTPUT/index.html: visualization for the graphs$OUTPUT/output.log: overall log for the whole verification
The 4 commands run the verification for all the 4 pairs of graphs with bugs introduced in the paper.
The verification is expected to raise errors. If you get any of the errors below, then it means the bug is successfully detected:
- entangle.sgraph.egraph.CannotFindPostconditions: Failed, check the conditions above.
- entangle.tools.egg.FailedImplyingEquivalence: User expectation violated.
# Assume you are in directory `examples`
python ./run_all.py grad_accumulation # (Bug 6 in paper)
python ./run_all.py missing_allreduce_under_wrong_config # (Bug 7 in paper)
python ./run_all.py missing_switchmlp_allreduce # (Bug 8 in paper)
python ./run_all.py missing_layernorm_allreduce # (Bug 9 in paper)To easily compare with results in the paper, run the visualization script to generate figures after step 2 (Model Verification Part).
python visualization.pyThe figures will be saved to examples/figures, including
one_layer_time.pdf: the end-to-end verification time results (Figure 3). The performance results can vary when you use different hardwares.GPT_scalability.pdf,Llama-3_scalability.pdf: the scalability results (Figure 4), where you should see a bit super-linear time cost increasing with the parallel degrees and linear time cost increasing with the model sizes.number_of_ops_and_lemmas.pdf: the number of operators and lemmas used (Figure 5.a), where you should see the same number as the one in the paper (except that data of company's proprietary models are removed).lemma_loc.pdf: the CDF of LOC of the lemmas (Figure 5.b), where you should see a similar trend as the one in the paper (you will see some differences since we only include open-sourced models/framworks here).lemma_applied_count_heatmap.pdf: the heatmap of lemma application counts (Figure 6), where you should see something similar to the one in the paper.
For your convenience, we also provide reference figures in examples/reference_figures to compare.
End of Artifact Evaluation Instructions
Our implementation supports the most commonly used operators defined in ATen. Thus, as long as your PyTorch model can be captured by TorchDynamo completely without break, our tool can verify your model directly.
We wrapped TorchDynamo to generate computation graphs in the format of PickleableGraph (or pgraph, see definition) for convenience. We recommend you to use it to capture your model.
An example can be found in capture_example.py. Simply run the command below to run this example:
# Assume you are in directory `examples`
python capture_example.pyThe output files will be printed on the terminal for you to check.
Since our implementation doesn't include some rerely used operators, and it is possible your model used some customized operators, you might need to
- Define the new operators. Examples can be found here.
- Define converter from
PickleableGraph(orpgraph) tosexpr. Examples can be found here.
For frameworks other than PyTorch (like TensorFlow, JAX, etc.), you must implement a converter from your framework's computation graph IR to Entangle's pgraph representation.
In this repository, we provided a simple incomplete example for HLO IR, where only a few operators required by our evaluation are implemented. You can refer to this example to implement your own converter.
Besides, new frameworks usually introduces new operators. Thus you should also follow New Operators to define the new operators used in your framework.
To run for your own model, you must provide clean input relations as mentioned in the paper. Examples can be found in verify_gpt.py, verify_aws_llama.py and verify_qwen2.py. The tool automatically found first Python class that inherits from ExplorativeConfig.
The commandline arguments are passed to the initializer of the module such that you can define your own arguments (like TP size).
With the captured graphs, you can run the tool using the command below:
export OUTDIR=<output directory>
mkdir -p $OUTDIR
tg infer \
--infer_manager greedy \
--stats \
-g <path to your graphs> \
--origin <directory name of sequential graph> \
--target <directory name of distributed graph> \
--graph_prefix <name of the graph break> \
--config_module <Python config file path> \
-o $OUTDIR \
[other options you defined inside your config module]Note for performance:
- We used Python
richpackage for fancy terminal outputs. However, we found it significantly slows down the overal performance. You may disable it by--disable_richfor better performance.
If your model used new operatros, you will need to also
- Define new operators. Examples can be found here.
- Define new lemmas regarding new operators if necessary. Examples can be found here.
If you discover a potential security issue in this project, or think you may have discovered a security issue, we ask that you notify Bytedance Security via our security center or vulnerability reporting email.
Please do not create a public GitHub issue.
This project is licensed under the Apache-2.0 License.
About ByteDance Seed Team
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
