GRPO or PPO?

Hi! Thank you for this great project. I have a question about the RL implementation.

The trainer is named `GRPOTrainer` (in `navr1/rl/grpo.py`), but the implementation seems to use standard PPO components: it trains a value network (`navr1/models/policy.py`), computes advantages using GAE, and collects different episodes rather than multiple attempts at the same input.

From my understanding, GRPO's key feature is avoiding value function training by using group-relative reward comparison (reward - group_mean). Could you clarify if this is a GRPO variant adapted for navigation tasks, or is it PPO with GRPO-inspired techniques?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO or PPO? #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO or PPO? #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions