-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Overview
Implement a difficulty ladder system inspired by CORE-Bench, where the same task can be presented at different difficulty levels based on how much scaffolding/context is provided.
Difficulty Levels
- Easy: More context provided (e.g., relevant files identified, hints given)
- Medium: Standard context (e.g., just the task description)
- Hard: Minimal context (e.g., agent must discover relevant files, figure out approach)
This is different from simple difficulty categorization - it's the same underlying task with varying levels of assistance.
Examples
Task: "Add error handling to the API routes"
- Easy: Points to specific files, shows which functions need handling
- Medium: Just the task description
- Hard: Agent must find API routes, identify unhandled cases, implement solution
Task: "Fix the failing tests"
- Easy: Test output provided, failing test identified
- Medium: Must run tests to see failures
- Hard: Must figure out how to run tests, interpret failures, fix issues
Tasks
- Design difficulty ladder schema in case format
- Implement context/scaffolding levels
- Create generator for difficulty variants from base case
- Add difficulty selection to
sniff run(e.g.,--difficulty hard) - Track performance across difficulty levels
Acceptance Criteria
- Cases can define Easy/Medium/Hard variants
- Same underlying task, different scaffolding
- Metrics track performance per difficulty level
- Generated cases automatically create difficulty variants
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request