-
Notifications
You must be signed in to change notification settings - Fork 455
Implement monitor rubrics #653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class MultiTurnMonitorRubric(MonitorRubric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit -- what's the need for a distinct MultiTurnMonitorRubric class? All env classes currently extend MultiTurnEnv already, and build up the trajectory (even if single-turn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Risks other extensions (e.g. SandboxMonitorRubric) not being multi-turn compatible by default, which seems undesirable.
| command_execution_times: list[float] | ||
|
|
||
|
|
||
| class SandboxMonitorRubric(vf.MonitorRubric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this serving a different role than MultiTurnMonitorRubric? is the idea to have many monitor rubrics which track new things introduced at different hierarchies?
|
Semi-related -- #645 adds an option to bypass scoring, which currently would skip all Rubric classes. Do we want to treat MonitorRubric instances as fundamentally separate in the case of a RubricGroup? |
| if self.rubric is None: | ||
| self.rubric = rubric | ||
| else: | ||
| self.rubric = vf.RubricGroup(rubrics=[self.rubric, rubric]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.rubric is already a RubricGroup, maybe we should do self.rubric.add_rubric ?
|
|
||
| class MultiTurnMonitorRubric(MonitorRubric): | ||
| def __init__(self): | ||
| super().__init__(state_keys=[("trajectory", "num_turns", len)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels a bit restrictive/unintuitive + not totally seeing the value of the state_keys approach.
we can already do:
async def num_turns(state) -> float:
return len(state['trajectory'])
which is fairly concise + avoids the need for a new pattern
could also be worth adding add_metric to Rubric which behaves just like add_reward_func but with default weight 0?
Description
This PR implements the
MonitorRubrica streamlined abstraction to export information from the state for logging purposes using declarative style. This is done by allowing to specify state keys (+arbitrary transforms) which will get registered as (zero-weight) reward functions. For example, one may count the length of thetrajectoryfield to lognum_turns(implemented inMultiTurnMonitorRubric) or average the command execution timeouts in sandboxes from thesandbox_state.command_execution_timesasavg_command_execution_time(implemented inSandboxMonitorRubric).In addition, implements the
add_rubricmethod onEnvironmentwhich composes the currently registered rubrics with an additional one usingRubricGroup. This can be used to register "default" monitor rubrics (e.g. always exportnum_turnsinMultiTurnEnv) across environment hierarchies.Example
We see the following registered monitors from the respective environments:
MultiTurnEnv:num_turnsToolEnv:total_tool_callsSandboxEnv:sandbox_ready_wait_time,sandbox_command_execution_timePythonEnv:python_ready_wait_timeType of Change
Testing
uv run pytestlocally.Checklist
Additional Notes