Skip to content

Comments

Tune stage 1 balance rewards and hyperparams to fix training collapse#63

Merged
kuds merged 1 commit intomainfrom
claude/review-velociraptor-training-6MGUD
Feb 21, 2026
Merged

Tune stage 1 balance rewards and hyperparams to fix training collapse#63
kuds merged 1 commit intomainfrom
claude/review-velociraptor-training-6MGUD

Conversation

@kuds
Copy link
Owner

@kuds kuds commented Feb 21, 2026

This pull request updates the reinforcement learning configuration for the Velociraptor's stage 1 balance task. The main focus is on tuning reward weights and learning parameters to improve the agent's ability to stand and balance. The most important changes are grouped below:

Reward and environment parameter adjustments:

  • Increased posture_weight from 0.2 to 2.0 to emphasize maintaining proper posture.
  • Set energy_penalty_weight to 0.0 (from 0.0005) to remove the penalty for energy use.
  • Introduced a small smoothness_weight of 0.1 (was 0.0) to encourage smoother movements.

Learning algorithm parameter tuning:

  • Reduced PPO learning_rate from 3e-4 to 1e-4 and increased ent_coef from 0.05 to 0.1 to promote more exploration.

Curriculum criteria update:

  • Lowered min_avg_episode_length from 400 to 300 to allow curriculum progression with shorter average episodes.

- posture_weight 0.2 → 2.0: make posture matter vs alive bonus
- energy_penalty_weight 0.0005 → 0.0: remove distraction for balance stage
- smoothness_weight 0.0 → 0.1: encourage smooth corrective movements
- learning_rate 3e-4 → 1e-4: prevent sharp policy degradation
- ent_coef 0.05 → 0.1: maintain exploration, avoid freeze-and-fall
- min_avg_episode_length 400 → 300: more achievable curriculum gate

https://claude.ai/code/session_019KsnCT9nvDg5cpTMquZTzQ
@kuds kuds merged commit 81a9df9 into main Feb 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants