You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can run a wide range of experiments using the spinup.run command wrapper. Here are the main commands and some of the most useful arguments to control your experiments.
Training an Agent
This is the main command for training an agent with a specific algorithm.
python -m spinup.run [ALGORITHM] [ARGUMENTS...]
[ALGORITHM]: The name of the reinforcement learning algorithm you want to use. Your modernized setup supports the following PyTorch versions:
ppo
vpg
ddpg
td3
sac
[ARGUMENTS...]: You can provide arguments to configure the training run. The most common ones are:
--env [ENV_NAME]: Specifies the environment to train on (e.g., LunarLander-v3, Ant-v4, Humanoid-v4).
--exp_name [NAME]: Gives your experiment a custom name.
--epochs [NUMBER]: Sets the number of training epochs.
--steps [NUMBER]: Sets the number of steps per epoch.
--seed [NUMBER]: Sets the random seed for reproducibility.
Train a Soft Actor-Critic (SAC) agent on the Ant environment for 100 epochs
This command loads a saved policy and runs it in the environment so you can watch it perform.
python -m spinup.run test_policy [PATH_TO_EXPERIMENT_DATA]
[PATH_TO_EXPERIMENT_DATA]: The full path to the directory where your model was saved (e.g., data/my_ant_sac/my_ant_sac_s0).
Watch the Ant agent you just trained for 5 episodes
This is a powerful feature for hyperparameter tuning. You can provide a list of values for any argument, and Spinning Up will automatically run an experiment for each combination.
You can simply list multiple values after an argument flag.
Let's test PPO on LunarLander-v3 with three different learning rates.
This single command will run three full experiments, saving each in a separate subdirectory within data/lunar_lr_search. You can then use the plot command on that parent directory to see which learning rate performed best.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
You can run a wide range of experiments using the spinup.run command wrapper. Here are the main commands and some of the most useful arguments to control your experiments.
Training an Agent
This is the main command for training an agent with a specific algorithm.
python -m spinup.run [ALGORITHM] [ARGUMENTS...]
[ALGORITHM]: The name of the reinforcement learning algorithm you want to use. Your modernized setup supports the following PyTorch versions:
[ARGUMENTS...]: You can provide arguments to configure the training run. The most common ones are:
Train a Soft Actor-Critic (SAC) agent on the Ant environment for 100 epochs
python -m spinup.run sac --env Ant-v4 --epochs 100 --exp_name my_ant_sac
python -m spinup.run sac --env Ant-v5 --epochs 200 --exp_name exp_sac_antv5_july_20_2025
Testing a Trained Agent
This command loads a saved policy and runs it in the environment so you can watch it perform.
python -m spinup.run test_policy [PATH_TO_EXPERIMENT_DATA]
[PATH_TO_EXPERIMENT_DATA]: The full path to the directory where your model was saved (e.g., data/my_ant_sac/my_ant_sac_s0).
Watch the Ant agent you just trained for 5 episodes
python -m spinup.run test_policy data/my_ant_sac/my_ant_sac_s0 --episodes 5
Plotting Results
This command reads the progress.txt file from one or more experiments and generates performance graphs.
Plot a single experiment
python -m spinup.run plot [PATH_TO_EXPERIMENT_DATA]
Plot and compare multiple experiments on the same graph
python -m spinup.run plot [PATH_1] [PATH_2] ... [PATH_N]
Compare the performance of two different experiments on the same plot
python -m spinup.run plot data/my_ppo_run/my_ppo_run_s0 data/my_sac_run/my_sac_run_s0
Running Multiple Experiments (Grid Search)
This is a powerful feature for hyperparameter tuning. You can provide a list of values for any argument, and Spinning Up will automatically run an experiment for each combination.
You can simply list multiple values after an argument flag.
Let's test PPO on LunarLander-v3 with three different learning rates.
python -m spinup.run ppo --env LunarLander-v3 --pi_lr 0.001 0.0003 0.00001 --exp_name lunar_lr_search
This single command will run three full experiments, saving each in a separate subdirectory within data/lunar_lr_search. You can then use the plot command on that parent directory to see which learning rate performed best.
Beta Was this translation helpful? Give feedback.
All reactions