Skip to content

Difficulty Ladder System #22

@jharris1679

Description

@jharris1679

Overview

Implement a difficulty ladder system inspired by CORE-Bench, where the same task can be presented at different difficulty levels based on how much scaffolding/context is provided.

Difficulty Levels

  • Easy: More context provided (e.g., relevant files identified, hints given)
  • Medium: Standard context (e.g., just the task description)
  • Hard: Minimal context (e.g., agent must discover relevant files, figure out approach)

This is different from simple difficulty categorization - it's the same underlying task with varying levels of assistance.

Examples

Task: "Add error handling to the API routes"

  • Easy: Points to specific files, shows which functions need handling
  • Medium: Just the task description
  • Hard: Agent must find API routes, identify unhandled cases, implement solution

Task: "Fix the failing tests"

  • Easy: Test output provided, failing test identified
  • Medium: Must run tests to see failures
  • Hard: Must figure out how to run tests, interpret failures, fix issues

Tasks

  • Design difficulty ladder schema in case format
  • Implement context/scaffolding levels
  • Create generator for difficulty variants from base case
  • Add difficulty selection to sniff run (e.g., --difficulty hard)
  • Track performance across difficulty levels

Acceptance Criteria

  • Cases can define Easy/Medium/Hard variants
  • Same underlying task, different scaffolding
  • Metrics track performance per difficulty level
  • Generated cases automatically create difficulty variants

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions