Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions cases/bootstrap/api-documentation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
id: bootstrap-020
title: "Add API Documentation"
prompt: |
The following code lacks API documentation. Add comprehensive documentation by:
- Adding docstrings to all public functions and classes
- Documenting parameters and return values
- Adding usage examples
- Documenting exceptions and edge cases
- Following documentation best practices

Run: python api_documentation.test.py
Make all tests pass.

source: bootstrap
category: documentation
language: python
difficulty: easy

tags:
- python
- documentation
- api-docs
- docstrings

files:
- path: api_documentation.py
content: |
def calculate_discount(price, discount_percent):
"""Calculate discounted price."""
return price * (1 - discount_percent / 100)

class PaymentProcessor:
def process(self, amount, currency):
"""Process payment."""
return {"status": "success", "amount": amount}

def refund(self, transaction_id):
"""Refund transaction."""
return {"status": "success"}
- path: api_documentation.test.py
content: |
import unittest
from api_documentation import calculate_discount, PaymentProcessor

class TestAPIDocumentation(unittest.TestCase):

def test_calculate_discount(self):
result = calculate_discount(100, 20)
self.assertEqual(result, 80)

def test_payment_processor(self):
processor = PaymentProcessor()
result = processor.process(50, "USD")
self.assertEqual(result["status"], "success")

def test_refund(self):
processor = PaymentProcessor()
result = processor.refund("tx123")
self.assertEqual(result["status"], "success")

if __name__ == '__main__':
unittest.main()
66 changes: 66 additions & 0 deletions cases/bootstrap/authentication.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
id: bootstrap-019
title: "Implement Proper Authentication Checks"
prompt: |
The following code lacks proper authentication. Add authentication by:
- Implementing proper authentication checks
- Using secure password storage (hashing, not plain text)
- Adding role-based access control
- Ensuring sensitive operations are protected

Run: python authentication.test.py
Make all tests pass.

source: bootstrap
category: security
language: python
difficulty: medium

tags:
- python
- security
- authentication
- authorization

files:
- path: authentication.py
content: |
def get_user(username, password):
"""Get user by credentials - no authentication."""
if username == "admin" and password == "password":
return {"username": "admin", "role": "admin"}
return None

def delete_user(user_id):
"""Delete user - no authentication."""
# Delete user from database
return True

def get_admin_data():
"""Get admin data - no authentication."""
return {"sensitive": "data"}
- path: authentication.test.py
content: |
import unittest
from authentication import get_user, delete_user, get_admin_data

class TestAuthentication(unittest.TestCase):

def test_get_user_success(self):
user = get_user("admin", "password")
self.assertIsNotNone(user)
self.assertEqual(user["role"], "admin")

def test_get_user_failure(self):
user = get_user("admin", "wrong")
self.assertIsNone(user)

def test_delete_user(self):
result = delete_user(1)
self.assertTrue(result)

def test_get_admin_data(self):
data = get_admin_data()
self.assertEqual(data["sensitive"], "data")
Comment on lines +57 to +63
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tests for delete_user and get_admin_data validate the insecure behavior — they'll break after the solver adds authentication.

The prompt instructs the solver to "ensure sensitive operations are protected," but:

  • test_delete_user calls delete_user(1) with no auth context and asserts True.
  • test_get_admin_data calls get_admin_data() with no auth context and asserts the data is returned.

After adding authentication, these unauthenticated calls should be rejected, causing both tests to fail. The tests need to:

  1. Supply valid credentials/tokens for authorized-access tests.
  2. Assert that calls without auth raise an error or return a denial.
🤖 Prompt for AI Agents
In `@cases/bootstrap/authentication.yaml` around lines 57 - 63, Update the tests
for delete_user and get_admin_data to reflect added authentication: in the
test_delete_user and test_get_admin_data cases, call the functions twice — once
with a valid auth context/token/credentials (e.g., passing an authenticated user
object or auth header) and assert successful behavior, and once without any auth
and assert that the call is rejected (raises an exception or returns an
unauthorized/forbidden result). Specifically modify the tests that call
delete_user(1) and get_admin_data() to include an authenticated variant that
succeeds and an unauthenticated variant that asserts denial.


if __name__ == '__main__':
unittest.main()
65 changes: 65 additions & 0 deletions cases/bootstrap/code-readability.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
id: bootstrap-016
title: "Improve Code Readability"
prompt: |
The following code is hard to read. Improve it by:
- Renaming variables and functions to be more descriptive
- Breaking down long functions into smaller ones
- Adding comments where helpful
- Improving code structure and organization
- Following naming conventions

Run: python code_readability.test.py
Make all tests pass.

source: bootstrap
category: refactoring
language: python
difficulty: easy

tags:
- python
- readability
- code-quality
- naming

files:
- path: code_readability.py
content: |
def calc(a, b, c):
return (a * b) + c

def f(x, y):
if x > 0:
if y > 0:
return x + y
else:
return x - y
else:
return 0

def g(d, e, f):
return d * e + f
- path: code_readability.test.py
content: |
import unittest
from code_readability import calc, f, g

class TestCodeReadability(unittest.TestCase):

def test_calc(self):
self.assertEqual(calc(2, 3, 4), 10)

def test_f_positive(self):
self.assertEqual(f(5, 3), 8)

def test_f_negative(self):
self.assertEqual(f(5, -3), 8)

def test_f_zero(self):
self.assertEqual(f(0, 5), 0)

def test_g(self):
self.assertEqual(g(2, 3, 4), 10)
Comment on lines +42 to +62
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tests hard-code the old function names — renaming (the primary task) will break them.

The prompt asks the solver to rename calc, f, and g to descriptive names, but the test file imports and calls them by those exact names:

from code_readability import calc, f, g

After renaming, every test will fail with ImportError. This creates a contradiction: the solver can't follow the prompt without breaking the tests, and can't pass the tests without ignoring the prompt.

Consider either:

  • Updating the tests to use descriptive names that the solver is expected to converge on (and documenting the expected names in the prompt).
  • Restructuring so the tests validate behavior through a stable interface, and the renaming task applies to internal helpers.
🤖 Prompt for AI Agents
In `@cases/bootstrap/code-readability.yaml` around lines 42 - 62, Tests import and
call the old names calc, f, and g which will break when you rename them; either
update the tests to import and assert the new descriptive function names
(replace uses of calc/f/g with the new names you chose) or, to preserve backward
compatibility, add aliases in the code_readability module that map the new
function names back to calc, f, and g (e.g., new_name = original_function; calc
= new_name) so tests continue to import the old symbols while internals use
descriptive names.


if __name__ == '__main__':
unittest.main()
71 changes: 71 additions & 0 deletions cases/bootstrap/component-extraction.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
id: bootstrap-014
title: "Extract Reusable Components from Duplicated Code"
prompt: |
The following code has significant duplication. Refactor it by:
- Identifying common patterns and logic
- Extracting reusable functions or classes
- Reducing code duplication
- Improving maintainability

Run: python component_extraction.test.py
Make all tests pass.

source: bootstrap
category: refactoring
language: python
difficulty: medium

tags:
- python
- refactoring
- code-reuse
- dry

files:
- path: component_extraction.py
content: |
def process_user_data(user):
"""Process user data - duplicated logic."""
if user.get('age') < 18:
return "Minor"
elif user.get('age') >= 18 and user.get('age') < 65:
return "Adult"
else:
return "Senior"

def process_customer_data(customer):
"""Process customer data - duplicated logic."""
if customer.get('age') < 18:
return "Minor"
elif customer.get('age') >= 18 and customer.get('age') < 65:
return "Adult"
else:
return "Senior"

def process_employee_data(employee):
"""Process employee data - duplicated logic."""
if employee.get('age') < 18:
return "Minor"
elif employee.get('age') >= 18 and employee.get('age') < 65:
return "Adult"
else:
return "Senior"

- path: component_extraction.test.py
content: |
import unittest
from component_extraction import process_user_data, process_customer_data, process_employee_data

class TestComponentExtraction(unittest.TestCase):

def test_user_adult(self):
self.assertEqual(process_user_data({'age': 30}), "Adult")

def test_customer_minor(self):
self.assertEqual(process_customer_data({'age': 15}), "Minor")

def test_employee_senior(self):
self.assertEqual(process_employee_data({'age': 70}), "Senior")

if __name__ == '__main__':
unittest.main()
Comment on lines +54 to +71
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Tests lack boundary-value coverage — off-by-one errors at 18 and 65 would go undetected.

Each age category is tested once, but the boundaries (18 and 65) are never asserted. A solver could introduce < 18 vs <= 18 or < 65 vs <= 65 regressions and still pass all tests. For a benchmark case, this significantly weakens the acceptance criteria.

Consider adding at least:

def test_boundary_18(self):
    self.assertEqual(process_user_data({'age': 18}), "Adult")

def test_boundary_65(self):
    self.assertEqual(process_user_data({'age': 65}), "Senior")
🤖 Prompt for AI Agents
In `@cases/bootstrap/component-extraction.yaml` around lines 54 - 71, Add
boundary-value tests to cover off-by-one cases: inside the
TestComponentExtraction test class in component_extraction.test.py, add a test
method that calls process_user_data({'age': 18}) and asserts "Adult"
(complements test_user_adult) and another test method that calls
process_employee_data({'age': 65}) and asserts "Senior" (complements
test_employee_senior); name them e.g. test_boundary_18 and test_boundary_65 so
they run with the existing suite.

64 changes: 64 additions & 0 deletions cases/bootstrap/error-handling.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
id: bootstrap-008
title: "Add Error Handling to Functions"
prompt: |
The following functions lack proper error handling. Add try-catch blocks
to handle potential errors gracefully. The functions should:
- Catch common exceptions (ValueError, TypeError, IOError)
- Return meaningful error messages
- Log errors appropriately
- Not crash the application

Run: python error_handling.test.py
Make all tests pass.
Comment on lines +3 to +12
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Prompt and tests contradict each other — exercise is unsolvable as written.

The prompt instructs:

  • "Add try-catch blocks to handle potential errors gracefully"
  • "Return meaningful error messages"
  • "Not crash the application"

But every test asserts that the raw exceptions still propagate (assertRaises(ZeroDivisionError), assertRaises(TypeError), etc.). If a user follows the prompt and wraps the functions in try/except to return error messages, all four tests will fail. If they make the tests pass by leaving the code as-is, they haven't followed the prompt.

You need to align the two. Either:

  1. Change the tests to assert that error messages (strings/dicts) are returned instead of exceptions being raised, matching the prompt's intent.
  2. Change the prompt to say "ensure proper exceptions are raised with clear messages" instead of "catch and return."

Also applies to: 45-61

🤖 Prompt for AI Agents
In `@cases/bootstrap/error-handling.yaml` around lines 3 - 12, The prompt and
tests conflict: the prompt asks functions to catch exceptions and return error
messages, but error_handling.test.py asserts raw exceptions (e.g.,
assertRaises(ZeroDivisionError), assertRaises(TypeError)); align them by either
(A) updating the tests to expect returned error objects/strings (change
assertions that use assertRaises(...) to assertEqual/contains matching error
messages for the functions under test) so they validate graceful return values,
or (B) updating the prompt YAML to require that functions raise clear,
well-documented exceptions (e.g., "raise ValueError('...')") instead of catching
them; pick one approach and update all related occurrences (also referenced in
the same file region lines 45-61) so prompt and tests are consistent.


source: bootstrap
category: error-handling
language: python
difficulty: easy

tags:
- python
- error-handling
- exception-handling

files:
- path: error_handling.py
content: |
def divide_numbers(a, b):
"""Divide two numbers."""
return a / b

def read_file(filepath):
"""Read content from a file."""
with open(filepath, 'r') as f:
return f.read()

def process_data(data):
"""Process some data."""
return data.upper()

- path: error_handling.test.py
content: |
import unittest
from error_handling import divide_numbers, read_file, process_data

class TestErrorHandling(unittest.TestCase):

def test_divide_by_zero(self):
with self.assertRaises(ZeroDivisionError):
divide_numbers(10, 0)

def test_invalid_division(self):
with self.assertRaises(TypeError):
divide_numbers("10", 2)

def test_file_not_found(self):
with self.assertRaises(FileNotFoundError):
read_file("nonexistent.txt")

def test_invalid_data(self):
with self.assertRaises(AttributeError):
process_data(None)

if __name__ == '__main__':
unittest.main()
64 changes: 64 additions & 0 deletions cases/bootstrap/input-validation.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
id: bootstrap-010
title: "Add Input Validation to User-Facing Functions"
prompt: |
The following functions accept user input without proper validation.
Add validation to:
- Check input types (e.g., ensure numbers are actually numbers)
- Validate input ranges and constraints
- Return appropriate error messages for invalid input
- Handle edge cases (None, empty strings, etc.)

Run: python input_validation.test.py
Make all tests pass.

source: bootstrap
category: validation
language: python
difficulty: easy

tags:
- python
- validation
- input-validation
- type-checking

files:
- path: input_validation.py
content: |
def calculate_age(birth_year, current_year):
"""Calculate age from birth year."""
return current_year - birth_year

def get_user_email(email):
"""Validate and return email."""
return email

def process_order(quantity, price):
"""Process an order."""
return quantity * price

- path: input_validation.test.py
content: |
import unittest
from input_validation import calculate_age, get_user_email, process_order

class TestInputValidation(unittest.TestCase):

def test_invalid_birth_year(self):
with self.assertRaises(ValueError):
calculate_age("1990", 2024)

def test_invalid_email(self):
with self.assertRaises(ValueError):
get_user_email("invalid-email")

def test_negative_quantity(self):
with self.assertRaises(ValueError):
process_order(-5, 10)

def test_zero_quantity(self):
with self.assertRaises(ValueError):
process_order(0, 10)

if __name__ == '__main__':
unittest.main()
Loading
Loading