Tune stage 1 balance rewards and hyperparams to fix training collapse by kuds · Pull Request #63 · kuds/mesozoic-labs

kuds · 2026-02-21T19:35:22Z

This pull request updates the reinforcement learning configuration for the Velociraptor's stage 1 balance task. The main focus is on tuning reward weights and learning parameters to improve the agent's ability to stand and balance. The most important changes are grouped below:

Reward and environment parameter adjustments:

Increased posture_weight from 0.2 to 2.0 to emphasize maintaining proper posture.
Set energy_penalty_weight to 0.0 (from 0.0005) to remove the penalty for energy use.
Introduced a small smoothness_weight of 0.1 (was 0.0) to encourage smoother movements.

Learning algorithm parameter tuning:

Reduced PPO learning_rate from 3e-4 to 1e-4 and increased ent_coef from 0.05 to 0.1 to promote more exploration.

Curriculum criteria update:

Lowered min_avg_episode_length from 400 to 300 to allow curriculum progression with shorter average episodes.

- posture_weight 0.2 → 2.0: make posture matter vs alive bonus - energy_penalty_weight 0.0005 → 0.0: remove distraction for balance stage - smoothness_weight 0.0 → 0.1: encourage smooth corrective movements - learning_rate 3e-4 → 1e-4: prevent sharp policy degradation - ent_coef 0.05 → 0.1: maintain exploration, avoid freeze-and-fall - min_avg_episode_length 400 → 300: more achievable curriculum gate https://claude.ai/code/session_019KsnCT9nvDg5cpTMquZTzQ

kuds merged commit 81a9df9 into main Feb 21, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Tune stage 1 balance rewards and hyperparams to fix training collapse#63

Tune stage 1 balance rewards and hyperparams to fix training collapse#63
kuds merged 1 commit intomainfrom
claude/review-velociraptor-training-6MGUD

kuds commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

kuds commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants