Skip to content

Add unit tests for MiniSWE agent#12

Open
joshgreaves wants to merge 2 commits intomainfrom
add-miniswe-tests
Open

Add unit tests for MiniSWE agent#12
joshgreaves wants to merge 2 commits intomainfrom
add-miniswe-tests

Conversation

@joshgreaves
Copy link
Contributor

@joshgreaves joshgreaves commented Jan 13, 2026

User description

Summary

  • Comprehensive unit tests for mini_swe_agent.py covering all core functionality
  • Tests for action parsing logic (single blocks, multiple blocks, format validation)
  • Tests for execution flow including timeouts, format errors, and final output detection
  • Tests for template rendering functions (system, instance, action observation, errors)
  • Mock-based testing to avoid external dependencies on minisweagent package

Test Coverage

Action Parsing (TestParseAction)

  • Single bash code block extraction with whitespace handling
  • Multiline command parsing
  • Error handling for multiple blocks and missing blocks

Execution Monitoring (TestRaiseIfFinished)

  • Detection of MINI_SWE_AGENT_FINAL_OUTPUT and COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT markers
  • Proper handling of normal output vs. final output
  • Edge cases: empty output, markers not at start, leading whitespace

Action Execution (TestExecuteAction)

  • Successful command execution with container mocking
  • Timeout handling and error propagation
  • Format error detection
  • Non-zero exit code handling

Template Rendering (TestHelperFunctions)

  • System template rendering
  • Instance template with platform variables
  • Action observation with output formatting
  • Format error and timeout message rendering

Message Management (TestMessageManagement)

  • Message addition to history
  • Empty message handling

🤖 Generated with Claude Code


Generated description

Adds comprehensive unit tests for the MiniSWECodeAgent within the ares module, ensuring the reliability of its action parsing, execution, and templating logic. Introduces new tests for the LLM cost accounting functions, verifying accurate model pricing retrieval and usage cost calculations.

TopicDetails
LLM Cost Accounting Introduces unit tests for the martian_cost_list function, ensuring correct fetching and caching of LLM model pricing, and for get_llm_cost, verifying accurate calculation of LLM usage costs with various scenarios.
Modified files (1)
  • src/ares/llms/accounting_test.py
Latest Contributors(0)
UserCommitDate
MiniSWE Agent Tests Adds comprehensive unit tests for the MiniSWECodeAgent's core functionalities, including action parsing logic, execution flow with timeout and error handling, template rendering functions, and message management.
Modified files (1)
  • src/ares/code_agents/mini_swe_agent_test.py
Latest Contributors(0)
UserCommitDate
This pull request is reviewed by Baz. Review like a pro on (Baz).

Rowan and others added 2 commits January 13, 2026 01:15
Implemented unit tests for the accounting module covering:
- `martian_cost_list` function: Tests for successful fetching, caching behavior,
  error handling (HTTP errors, network errors, invalid JSON), and default client creation
- `get_llm_cost` function: Tests for basic cost calculation, request charges,
  decimal precision, error handling (missing model, missing usage), None pricing fields,
  zero tokens, and large token counts

The tests use mocking to avoid external API calls and verify correct behavior
across various scenarios including edge cases and error conditions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Comprehensive test coverage for mini_swe_agent.py including:
- Action parsing (single/multiple blocks, whitespace, multiline commands)
- Execution flow (success, timeout, format errors, exit codes)
- Final output detection and submission markers
- Template rendering for system, instance, action observation, and errors
- Message management

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant