feat: Add minimal SeedRL-style example for SWE-bench by joshgreaves · Pull Request #10 · withmartian/ares

joshgreaves · 2026-01-12T23:34:55Z

Generated description

graph LR
main_("main"):::added
Learner_start_("Learner.start"):::added
Actor_run_episode_("Actor.run_episode"):::added
Learner_request_inference_("Learner.request_inference"):::added
Learner_run_batched_inference_("Learner._run_batched_inference"):::added
Learner_run_inference_("Learner._run_inference"):::added
build_openai_compatible_llm_response_("build_openai_compatible_llm_response"):::added
C51ValueHead_forward_("C51ValueHead.forward"):::added
main_ -- "Now starts Learner inference loop before spawning actors." --> Learner_start_
main_ -- "Main now gathers Actor.run_episode tasks for parallel episodes." --> Actor_run_episode_
Actor_run_episode_ -- "Actor requests LLMResponse each step via Learner.request_inference." --> Learner_request_inference_
Learner_run_batched_inference_ -- "Falls back to single-request path when batch size equals one." --> Learner_run_inference_
Learner_run_batched_inference_ -- "Uses helper to construct OpenAI-compatible responses per batch element." --> build_openai_compatible_llm_response_
Learner_run_batched_inference_ -- "Invokes value head on batched hidden states for diagnostics." --> C51ValueHead_forward_
Learner_run_inference_ -- "Wraps decoded generation into OpenAI-style LLMResponse via helper." --> build_openai_compatible_llm_response_
Learner_run_inference_ -- "Runs value head on last-token hidden states to estimate value." --> C51ValueHead_forward_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px

Adds a minimal SeedRL-style architecture example for the ares module's LLM integration, demonstrating a centralized Learner for batched inference and distributed Actors for running SWE-bench environment episodes. Introduces a new helper function, build_openai_compatible_llm_response, to standardize LLM response creation across examples and components.

Topic Details

SeedRL Example

Implements a minimal SeedRL-style architecture for the SWE-bench environment, featuring a centralized Learner for batched LLM inference and distributed Actors to run environment episodes. This includes a new helper function build_openai_compatible_llm_response to standardize LLM responses, which is also retrofitted into an existing example.

Modified files (3)

examples/02_local_llm.py
examples/03_seedrl_swebench.py
src/ares/llms/llm_clients.py

Latest Contributors(1)

User	Commit	Date
joshua.greaves@gmail.com	Add-example-2-local-LL...	January 09, 2026

Coding Guidelines

Adds a new Claude.md document outlining coding guidelines for the ARES codebase, focusing on commenting philosophy, general coding practices, and project-specific patterns.

Modified files (1)

Claude.md

Latest Contributors(0)

User	Commit	Date

This pull request is reviewed by Baz. Review like a pro on (Baz).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

baz-reviewer · 2026-01-13T01:16:11Z

examples/03_seedrl_swebench.py

+        response = llm_clients.LLMResponse(
+            chat_completion_response=openai.types.chat.chat_completion.ChatCompletion(
+                id=str(uuid.uuid4()),
+                choices=[
+                    openai.types.chat.chat_completion.Choice(
+                        message=openai.types.chat.chat_completion_message.ChatCompletionMessage(
+                            content=output_text,


The LLMResponse construction (lines 324‑347) is repeated again later (465‑488). Could we pull that into a helper to keep inference logic DRY?

Prompt for AI Agents:

In examples/03_seedrl_swebench.py around lines 324-347 and 465-488, the LLMResponse construction is duplicated inside the _run_inference and _run_batched_inference logic. Refactor by adding a new private helper method on Learner (e.g., def _build_llm_response(self, output_text: str, num_input_tokens: int, num_output_tokens: int) -> llm_clients.LLMResponse) that builds and returns the llm_clients.LLMResponse using self.model_name and uuid/time as currently done. Replace the inline response construction at lines 324-347 and 465-488 with calls to this helper, passing the calculated output_text and token counts, and update any variable names accordingly to remove duplication.

Finding type: Conciseness

Let’s make a helper function inside ARES to construct LLM requests easily, and use it here and in example 2

Got it—that’s a helpful pattern; I’ll remember to suggest a shared ARES helper for constructing LLM requests in similar situations after this PR is merged.

baz-reviewer · 2026-01-13T01:16:11Z

examples/03_seedrl_swebench.py

+        # Get generation parameters from requests (use first request's params)
+        temperature = batch[0].request.temperature or 1.0
+        do_sample = True if batch[0].request.temperature else False
+
+        # Generate completions for the entire batch
+        with torch.no_grad():
+            outputs = self.model.generate(
+                input_ids=batched_input_ids,
+                attention_mask=batched_attention_mask,
+                max_new_tokens=2048,
+                temperature=temperature,
+                do_sample=do_sample,
+                pad_token_id=self.tokenizer.pad_token_id,


The batched inference path always pulls temperature/do_sample from batch[0].request, so any subsequent request in the same batch silently inherits the first actor’s generation settings even though LLMRequest exposes per-request temperatures. This breaks the public contract that callers can specify their own sampling parameters, so separate temperatures should either trigger per-request batches or per-element handling before responding.

Finding type: Breaking Changes

Addresses PR #10 review feedback: - Add build_openai_compatible_llm_response() helper in llm_clients.py with full type hints and docstring for constructing LLMResponse from raw generation outputs - Refactor examples/02_local_llm.py to use the new helper, removing duplicate OpenAI object construction boilerplate - Refactor examples/03_seedrl_swebench.py to use the new helper in both single and batched inference paths - Fix batched inference contract: group requests by generation parameters (temperature, do_sample) and process each group separately to ensure parameters are respected per request while preserving original order - Pass compileall and ruff linter checks Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

baz-reviewer · 2026-01-13T18:52:51Z

examples/03_seedrl_swebench.py

+                temperature=req.request.temperature or 1.0,
+                do_sample=True if req.request.temperature else False,


Because the temperature is derived as req.request.temperature or 1.0, any request that explicitly requests temperature=0.0 (e.g., a deterministic greedy policy) is silently converted to 1.0 even though do_sample remains False, so the caller never receives the requested hyperparameter. The same or 1.0 pattern is reused when batching/grouping, so every 0.0-temperature request will run at 1.0 instead of the behavior the caller asked for.

Suggested change

temperature=req.request.temperature or 1.0,

do_sample=True if req.request.temperature else False,

temperature=1.0 if req.request.temperature is None else req.request.temperature,

do_sample=False if req.request.temperature == 0.0 else True,

Finding type: Logical Bugs

feat: Add minimal SeedRL-style example for SWE-bench

dbe5b46

baz-reviewer bot approved these changes Jan 12, 2026

View reviewed changes

feat: Implement continuous batching for SeedRL inference

7a002ee

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

baz-reviewer bot reviewed Jan 13, 2026

View reviewed changes

Rowan and others added 2 commits January 13, 2026 17:28

docs: Add Claude.md with commenting guidelines

f5f9328

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

baz-reviewer bot reviewed Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add minimal SeedRL-style example for SWE-bench#10

feat: Add minimal SeedRL-style example for SWE-bench#10
joshgreaves wants to merge 4 commits intomainfrom
feat/add-seedrl-example

joshgreaves commented Jan 12, 2026 •

edited by baz-reviewer bot

Loading

Uh oh!

baz-reviewer bot Jan 13, 2026

Uh oh!

joshgreaves Jan 13, 2026

Uh oh!

baz-reviewer bot Jan 13, 2026

Uh oh!

baz-reviewer bot Jan 13, 2026

Uh oh!

baz-reviewer bot Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		temperature=req.request.temperature or 1.0,
		do_sample=True if req.request.temperature else False,

Conversation

joshgreaves commented Jan 12, 2026 • edited by baz-reviewer bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generated description

Uh oh!

baz-reviewer bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

joshgreaves Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

baz-reviewer bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joshgreaves commented Jan 12, 2026 •

edited by baz-reviewer bot

Loading