-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Summary
When running the test test_solver_accumulated_random_walks_test_npy from qctrlresearchgpu/tests/test_isomorphism.py in delta motif repository using Legate with 2 GPUs, the test fails due to a mismatch in the number of subgraph isomorphisms found by two different methods, I am still trying to narrow down the bug, but I wanted to share the test that currently duplicates the problem. The test passes when run with only 1 GPU.
Steps to Reproduce
-
Set up the environment as specified in the attached
environment.yml. -
Run the following command with 2 GPUs:
legate --gpus=2 pytest qctrlresearchgpu/tests/test_isomorphism.py::test_solver_accumulated_random_walks_test_npy -
Observe the assertion error.
-
Run the same test with 1 GPU:
legate --gpus=1 pytest qctrlresearchgpu/tests/test_isomorphism.py::test_solver_accumulated_random_walks_test_npy -
The test passes.
Expected Behavior
The test should pass regardless of the number of GPUs used.
Actual Behavior
With 2 GPUs, the following assertion fails:
AssertionError: missmatch
Additional output, I am keeping track of the size of the Logical Table with 1 and 2 gpus
current_mappings_df size: 642 x 4 mask size: 642
current_mappings_df size: 818 x 5 mask size: 818
current_mappings_df size: 1058 x 6 mask size: 1058
current_mappings_df size: 2366 x 9 mask size: 2366
current_mappings_df size: 1800 x 8 mask size: 1800
current_mappings_df size: 956 x 10 mask size: 956
current_mappings_df size: 1588 x 16 mask size: 1588
current_mappings_df size: 3536 x 24 mask size: 3536
current_mappings_df size: 6048 x 32 mask size: 6048
**current_mappings_df size: 87 x 37 mask size: 87
current_mappings_df size: 28 x 42 mask size: 28
current_mappings_df size: 10 x 45 mask size: 10**
With 1 GPU, the test passes.
Compare this with the size of the Logical Table in a single gpu
current_mappings_df size: 642 x 4 mask size: 642
current_mappings_df size: 818 x 5 mask size: 818
current_mappings_df size: 1058 x 6 mask size: 1058
current_mappings_df size: 2366 x 9 mask size: 2366
current_mappings_df size: 1800 x 8 mask size: 1800
current_mappings_df size: 956 x 10 mask size: 956
current_mappings_df size: 1588 x 16 mask size: 1588
current_mappings_df size: 3536 x 24 mask size: 3536
**current_mappings_df size: 6048 x 32 mask size: 6048
current_mappings_df size: 208 x 37 mask size: 208
current_mappings_df size: 88 x 42 mask size: 88
current_mappings_df size: 32 x 45 mask size: 32**
Environment
- OS: Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.3 LTS
Release: 24.04
Codename: noble
- Python: 3.11.13
- Legate: 25.10.0.dev42+gb4fd4eaf8
- CUDA: 12.9
Additional Information
- The test compares the results of a motif-based subgraph isomorphism solver against VF2++ (Rustworkx).
- The discrepancy only appears when using more than one GPU.
- No error is raised, but the assertion fails due to a difference in the number of solutions found.