Skip to content

Conversation

@pimpale
Copy link
Collaborator

@pimpale pimpale commented Dec 23, 2025

Note

CLI and config updates

  • Replace full with all in config (hud/cli/eval.py) and add --all flag; --full now sets --all, --auto-respond, and --max-steps 100.
  • auto_respond is now an explicit boolean flag (no implicit default via --full); max_steps defaults to 10.
  • Task selection uses all for multi-task runs; display updated to show all and explicit max_steps.

Reward handling changes

  • Reward is computed in EvalContext.__aexit__ from evaluate tools; runners stop manually setting ctx.reward (hud/datasets/runner.py).
  • Telemetry exit uses self.reward directly (hud/eval/context.py).

Tool result structure

  • Environment returns MCPToolResult with structuredContent for both local and remote tool calls (hud/environment/environment.py).
  • Improve reward parse logging to print result.structuredContent (hud/agents/base.py).

Written by Cursor Bugbot for commit 2b60eb4. This will update automatically on new commits. Configure here.

@pimpale pimpale merged commit 016db63 into main Dec 23, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants