Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions cases/bootstrap/boot-001-lint-fix.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
id: boot-001
title: "Fix Linting Errors in a File"
prompt: |
Fix all linting errors in the file `${FILE_PATH}`.

The file currently has linting errors. Your task is to:
1. Run the linter on the file to see the errors
2. Fix each error to make the linter pass
3. Ensure the file still works correctly

Run: ${LINTER_COMMAND} ${FILE_PATH}

The linter should pass with exit code 0.
source: bootstrap
category: codefix
language: javascript
difficulty: easy

tags:
- linting
- code-quality
- javascript

files:
- path: ${FILE_PATH}
content: |
// This file has linting errors that need to be fixed
function add(a, b) {
return a + b
}

// TODO: Fix the linting errors above
module.exports = { add };

rubric:
extends: default
criteria:
linting:
weight: 50
description: "All linting errors are fixed"
evaluators:
- type: command
name: "Check linting passes"
run: ${LINTER_COMMAND} ${FILE_PATH}
passThreshold: 0.9
tests:
weight: 50
description: "Tests still pass"
evaluators:
- type: command
name: "Run tests"
run: ${TEST_COMMAND}
passThreshold: 0.9
56 changes: 56 additions & 0 deletions cases/bootstrap/boot-002-rename-symbol.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
id: boot-002
title: "Rename a Symbol Across the Codebase"
prompt: |
Rename the symbol ${OLD_NAME} to ${NEW_NAME} throughout the codebase.

This is a refactoring task. You need to:
1. Find all occurrences of ${OLD_NAME} in the codebase
2. Rename them to ${NEW_NAME} consistently
3. Update any references, imports, or exports
4. Ensure the code still compiles and works

Run: ${SEARCH_COMMAND} "${OLD_NAME}"
After renaming, verify with: ${SEARCH_COMMAND} "${NEW_NAME}"

The search should find all occurrences of the new name.
source: bootstrap
category: refactoring
language: javascript
difficulty: medium

tags:
- refactoring
- symbol-rename
- code-quality

files:
- path: ${FILE_PATH}
content: |
// This file uses the old name ${OLD_NAME}
function ${OLD_NAME}(x) {
return x * 2;
}

const ${OLD_NAME}_instance = new ${OLD_NAME}();

module.exports = { ${OLD_NAME} };

rubric:
extends: default
criteria:
rename-completeness:
weight: 50
description: "All occurrences of ${OLD_NAME} are renamed to ${NEW_NAME}"
evaluators:
- type: command
name: "Verify new name exists"
run: ${SEARCH_COMMAND} "${NEW_NAME}"
passThreshold: 0.9
Comment on lines +38 to +48
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing evaluator to verify the old symbol is removed.

The rubric only checks that ${NEW_NAME} exists but never verifies that ${OLD_NAME} is absent. An agent could pass this criterion by simply adding the new name without removing the old one. The sibling cases (boot-003, boot-005) use a search + low passThreshold (0.1) to assert pattern absence — this case should do the same.

Proposed fix: add an evaluator for old-name absence
 rubric:
   extends: default
   criteria:
     rename-completeness:
       weight: 50
       description: "All occurrences of ${OLD_NAME} are renamed to ${NEW_NAME}"
       evaluators:
         - type: command
           name: "Verify new name exists"
           run: ${SEARCH_COMMAND} "${NEW_NAME}"
           passThreshold: 0.9
+        - type: command
+          name: "Verify old name is gone"
+          run: ${SEARCH_COMMAND} "${OLD_NAME}"
+          passThreshold: 0.1
     tests:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rubric:
extends: default
criteria:
rename-completeness:
weight: 50
description: "All occurrences of ${OLD_NAME} are renamed to ${NEW_NAME}"
evaluators:
- type: command
name: "Verify new name exists"
run: ${SEARCH_COMMAND} "${NEW_NAME}"
passThreshold: 0.9
rubric:
extends: default
criteria:
rename-completeness:
weight: 50
description: "All occurrences of ${OLD_NAME} are renamed to ${NEW_NAME}"
evaluators:
- type: command
name: "Verify new name exists"
run: ${SEARCH_COMMAND} "${NEW_NAME}"
passThreshold: 0.9
- type: command
name: "Verify old name is gone"
run: ${SEARCH_COMMAND} "${OLD_NAME}"
passThreshold: 0.1
🤖 Prompt for AI Agents
In `@cases/bootstrap/boot-002-rename-symbol.yaml` around lines 38 - 48, Add an
evaluator to the rename-completeness criterion that verifies the old symbol is
removed by running ${SEARCH_COMMAND} against "${OLD_NAME}" with a low
passThreshold (e.g. 0.1); update criteria.rename-completeness.evaluators
(alongside the existing verifier for ${NEW_NAME}) to include a new evaluator
entry named like "Verify old name removed" that runs ${SEARCH_COMMAND}
"${OLD_NAME}" and uses passThreshold: 0.1 so the rubric asserts absence of
${OLD_NAME}.

tests:
weight: 50
description: "Tests still pass"
evaluators:
- type: command
name: "Run tests"
run: ${TEST_COMMAND}
passThreshold: 0.9
68 changes: 68 additions & 0 deletions cases/bootstrap/boot-003-extract-duplication.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
id: boot-003
title: "Extract Duplicated Code into Shared Function"
prompt: |
Extract duplicated code into a shared function.

The codebase has duplicated logic in multiple places. Your task is to:
1. Identify the duplicated code pattern
2. Create a shared function that encapsulates this logic
3. Replace all occurrences with calls to the shared function
4. Ensure the code still works correctly

Run: ${SEARCH_COMMAND} "${DUPLICATION_PATTERN}"
After refactoring, the pattern should be reduced to function calls.

The duplicated code should be eliminated.
source: bootstrap
category: refactoring
language: javascript
difficulty: medium

tags:
- refactoring
- code-duplication
- code-quality

files:
- path: ${FILE_PATH}
content: |
// This file has duplicated code that needs to be extracted
function processData1(data) {
// Duplicated logic
const result = data.map(item => item.value * 2);
return result.filter(item => item > 10);
}

function processData2(data) {
// Same duplicated logic
const result = data.map(item => item.value * 2);
return result.filter(item => item > 10);
}

function processData3(data) {
// Same duplicated logic again
const result = data.map(item => item.value * 2);
return result.filter(item => item > 10);
}

module.exports = { processData1, processData2, processData3 };

rubric:
extends: default
criteria:
duplication-eliminated:
weight: 50
description: "Duplicated code pattern is eliminated"
evaluators:
- type: command
name: "Check duplication reduced"
run: ${SEARCH_COMMAND} "${DUPLICATION_PATTERN}"
passThreshold: 0.1
tests:
weight: 50
description: "Tests still pass"
evaluators:
- type: command
name: "Run tests"
run: ${TEST_COMMAND}
passThreshold: 0.9
65 changes: 65 additions & 0 deletions cases/bootstrap/boot-004-add-types.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
id: boot-004
title: "Add Type Annotations to a Module"
prompt: |
Add type annotations to the module `${MODULE_NAME}`.

The module currently lacks type annotations. Your task is to:
1. Add TypeScript type annotations to all functions and variables
2. Define appropriate interfaces/types for the data structures
3. Ensure the code type-checks without errors
4. Keep the code working as before

Run: ${TYPECHECK_COMMAND} ${FILE_PATH}
The type-checker should pass with exit code 0.

All functions and variables should have proper type annotations.
source: bootstrap
category: type-safety
language: javascript
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

language should be typescript, not javascript.

The task is explicitly about adding TypeScript type annotations, the tags include typescript, and the evaluator runs a type-checker. The metadata field should match.

Fix
-language: javascript
+language: typescript
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
language: javascript
language: typescript
🤖 Prompt for AI Agents
In `@cases/bootstrap/boot-004-add-types.yaml` at line 18, The metadata field
'language' currently set to 'javascript' must be changed to 'typescript' to
match the task and evaluator; locate the YAML entry named language (value
'javascript') in the boot-004-add-types.yaml test case and replace its value
with 'typescript' so the task metadata, tags, and type-checker align.

difficulty: medium

tags:
- typescript
- type-annotations
- type-safety

files:
- path: ${FILE_PATH}
content: |
// This module needs type annotations
function calculate(a, b) {
return a + b;
}

function formatName(first, last) {
return `${first} ${last}`;
}

function processUser(user) {
return {
name: formatName(user.firstName, user.lastName),
age: user.age
};
}

module.exports = { calculate, formatName, processUser };

rubric:
extends: default
criteria:
type-annotations:
weight: 50
description: "All functions and variables have type annotations"
evaluators:
- type: command
name: "Type-check passes"
run: ${TYPECHECK_COMMAND} ${FILE_PATH}
passThreshold: 0.9
tests:
weight: 50
description: "Tests still pass"
evaluators:
- type: command
name: "Run tests"
run: ${TEST_COMMAND}
passThreshold: 0.9
60 changes: 60 additions & 0 deletions cases/bootstrap/boot-005-update-deprecated.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
id: boot-005
title: "Update Deprecated API Usage"
prompt: |
Update deprecated API usage in the codebase.

The codebase uses deprecated APIs that need to be updated. Your task is to:
1. Find all occurrences of deprecated APIs
2. Replace them with the current recommended APIs
3. Update any related code that depends on the old API
4. Ensure the code still works correctly

Run: ${SEARCH_COMMAND} "${DEPRECATED_PATTERN}"
After updating, the deprecated pattern should be gone.

The deprecated API should be completely replaced.
source: bootstrap
category: maintenance
language: javascript
difficulty: medium

tags:
- deprecation
- api-updates
- code-quality

files:
- path: ${FILE_PATH}
content: |
// This file uses deprecated APIs
const { deprecatedMethod } = require('old-library');

function oldFunction() {
// Using deprecated API
const result = deprecatedMethod({
oldOption: 'value'
});
return result;
}

module.exports = { oldFunction };

rubric:
extends: default
criteria:
deprecation-removed:
weight: 50
description: "Deprecated API usage is completely removed"
evaluators:
- type: command
name: "Check deprecated pattern removed"
run: ${SEARCH_COMMAND} "${DEPRECATED_PATTERN}"
passThreshold: 0.1
tests:
weight: 50
description: "Tests still pass"
evaluators:
- type: command
name: "Run tests"
run: ${TEST_COMMAND}
passThreshold: 0.9
Loading