`loop_fn` in `search.py` becomes slow as the tree depth increases

I am using mctx to implement MuZero, where actions are selected at each node via a forward pass of a large neural network.
Initially, the main bottleneck in terms of time cost is the model forward pass, which is expected and similar to cases where C++-based MCTS is used.
However, after a few steps of training, as the model starts to learn certain biases, the search tree becomes deeper. At this point, the bottleneck shifts to the `loop_fn` used in the selection and backpropagation phases.

Here's the trace of one search. You can see that most of the time is spent in `search.py:291` and `search.py:181`, which correspond to (`search.py` was slightly adjusted to fit my codebase, so the line number mismatches):
- `tree, _, _ = jax.lax.while_loop(cond_fun, body_fun, loop_state)` in `backward`
- `end_state = jax.lax.while_loop(cond_fun, body_fun, initial_state)` in `simulate`

<img width="1538" height="504" alt="Image" src="https://github.com/user-attachments/assets/34b86c4b-7476-4cad-86a1-69d79a66cbfb" />

<img width="1226" height="218" alt="Image" src="https://github.com/user-attachments/assets/c3c04bdb-a6d8-45b4-876b-c3b5d65f6325" />

My question is: is there any way to improve the time efficiency of this part?
In practice, even with a tree depth of only a few tens, the performance is heavily affected by the huge gap in efficiency between `jax.lax.fori_loop` on GPU and CPU (with GPU being thousands of times slower).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`loop_fn` in `search.py` becomes slow as the tree depth increases #107

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

loop_fn in search.py becomes slow as the tree depth increases #107

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`loop_fn` in `search.py` becomes slow as the tree depth increases #107