Add VecNormalize support to evaluate_with_forward_vel function#73
Merged
Conversation
The evaluate_with_forward_vel and evaluate_trained_policy functions were creating raw RaptorEnv instances without VecNormalize, so the trained model received unnormalized observations and produced garbage actions. This caused the curriculum gate check to always fail despite the training curves showing the model had learned successfully. Add vecnorm_path parameter to both functions and normalize observations using the saved VecNormalize stats, matching what record_stage_video already does correctly. https://claude.ai/code/session_011tUK1ofRqTfnLTTNXLvQzR
Apply the same fix from the velociraptor notebook to the brachiosaurus and trex training notebooks. Both had evaluate_with_forward_vel and evaluate_trained_policy using raw unnormalized observations, causing the curriculum gate and final evaluation to produce garbage results. https://claude.ai/code/session_011tUK1ofRqTfnLTTNXLvQzR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request updates the evaluation logic for trained policies in the
notebooks/brachiosaurus_training.ipynbnotebook to ensure that observations are normalized during evaluation, matching the input distribution used during training. This helps produce more reliable and consistent evaluation results.Evaluation improvements:
evaluate_trained_policyfunction now accepts an optionalvecnorm_pathparameter. If provided, it loads savedVecNormalizestatistics and uses them to normalize observations during evaluation, ensuring consistency with training-time normalization.vecnorm_pathargument, activating observation normalization for this stage.Resource management:
VecNormalizeobject after evaluation to free up resources.