Unstable Training

When I switch to the areal v0.3.3 and run the asearcher_local experiment, I got very unstable training and training cannot continue due to very high rollout time costs. 

Below are the hyperparameters I used to run the experiment and the logs. (The blue curves are relatively stable and the green and grey curves are unstable and are created after I switched to areal v0.3.3)
epochs=10, 4p1t1+4p1t1, batch_size=128, max_concurrent_rollouts=48, mem_fraction_static=0.80

<img width="2140" height="1330" alt="Image" src="https://github.com/user-attachments/assets/582d7479-dcf1-4705-9150-ac7e1848601b" />

<img width="1024" height="1198" alt="Image" src="https://github.com/user-attachments/assets/a8fd4e0e-6bc0-4ae8-b7e5-3a570d694fe3" />

<img width="1038" height="553" alt="Image" src="https://github.com/user-attachments/assets/13dac184-f7a1-4725-8250-836efbd7e627" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unstable Training #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unstable Training #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions