Add unit tests for MiniSWE agent by joshgreaves · Pull Request #12 · withmartian/ares

joshgreaves · 2026-01-13T01:20:23Z

User description

Summary

Comprehensive unit tests for mini_swe_agent.py covering all core functionality
Tests for action parsing logic (single blocks, multiple blocks, format validation)
Tests for execution flow including timeouts, format errors, and final output detection
Tests for template rendering functions (system, instance, action observation, errors)
Mock-based testing to avoid external dependencies on minisweagent package

Test Coverage

Action Parsing (`TestParseAction`)

Single bash code block extraction with whitespace handling
Multiline command parsing
Error handling for multiple blocks and missing blocks

Execution Monitoring (`TestRaiseIfFinished`)

Detection of MINI_SWE_AGENT_FINAL_OUTPUT and COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT markers
Proper handling of normal output vs. final output
Edge cases: empty output, markers not at start, leading whitespace

Action Execution (`TestExecuteAction`)

Successful command execution with container mocking
Timeout handling and error propagation
Format error detection
Non-zero exit code handling

Template Rendering (`TestHelperFunctions`)

System template rendering
Instance template with platform variables
Action observation with output formatting
Format error and timeout message rendering

Message Management (`TestMessageManagement`)

Message addition to history
Empty message handling

🤖 Generated with Claude Code

Generated description

Adds comprehensive unit tests for the MiniSWECodeAgent within the ares module, ensuring the reliability of its action parsing, execution, and templating logic. Introduces new tests for the LLM cost accounting functions, verifying accurate model pricing retrieval and usage cost calculations.

Topic Details

LLM Cost Accounting

Introduces unit tests for the martian_cost_list function, ensuring correct fetching and caching of LLM model pricing, and for get_llm_cost, verifying accurate calculation of LLM usage costs with various scenarios.

Modified files (1)

src/ares/llms/accounting_test.py

Latest Contributors(0)

User	Commit	Date

MiniSWE Agent Tests

Adds comprehensive unit tests for the MiniSWECodeAgent's core functionalities, including action parsing logic, execution flow with timeout and error handling, template rendering functions, and message management.

Modified files (1)

src/ares/code_agents/mini_swe_agent_test.py

Latest Contributors(0)

User	Commit	Date

This pull request is reviewed by Baz. Review like a pro on (Baz).

Implemented unit tests for the accounting module covering: - `martian_cost_list` function: Tests for successful fetching, caching behavior, error handling (HTTP errors, network errors, invalid JSON), and default client creation - `get_llm_cost` function: Tests for basic cost calculation, request charges, decimal precision, error handling (missing model, missing usage), None pricing fields, zero tokens, and large token counts The tests use mocking to avoid external API calls and verify correct behavior across various scenarios including edge cases and error conditions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Comprehensive test coverage for mini_swe_agent.py including: - Action parsing (single/multiple blocks, whitespace, multiline commands) - Execution flow (success, timeout, format errors, exit codes) - Final output detection and submission markers - Template rendering for system, instance, action observation, and errors - Message management Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Rowan and others added 2 commits January 13, 2026 01:15

joshgreaves mentioned this pull request Jan 13, 2026

Add unit tests for MiniSWE agent #13

Closed

baz-reviewer bot approved these changes Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unit tests for MiniSWE agent#12

Add unit tests for MiniSWE agent#12
joshgreaves wants to merge 2 commits intomainfrom
add-miniswe-tests

joshgreaves commented Jan 13, 2026 •

edited by baz-reviewer bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joshgreaves commented Jan 13, 2026 • edited by baz-reviewer bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Test Coverage

Action Parsing (TestParseAction)

Execution Monitoring (TestRaiseIfFinished)

Action Execution (TestExecuteAction)

Template Rendering (TestHelperFunctions)

Message Management (TestMessageManagement)

Generated description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joshgreaves commented Jan 13, 2026 •

edited by baz-reviewer bot

Loading

Action Parsing (`TestParseAction`)

Execution Monitoring (`TestRaiseIfFinished`)

Action Execution (`TestExecuteAction`)

Template Rendering (`TestHelperFunctions`)

Message Management (`TestMessageManagement`)