Skip to content

feat: Add minimal SeedRL-style example for SWE-bench#10

Open
joshgreaves wants to merge 4 commits intomainfrom
feat/add-seedrl-example
Open

feat: Add minimal SeedRL-style example for SWE-bench#10
joshgreaves wants to merge 4 commits intomainfrom
feat/add-seedrl-example

Conversation

@joshgreaves
Copy link
Contributor

@joshgreaves joshgreaves commented Jan 12, 2026

Generated description

graph LR
main_("main"):::added
Learner_start_("Learner.start"):::added
Actor_run_episode_("Actor.run_episode"):::added
Learner_request_inference_("Learner.request_inference"):::added
Learner_run_batched_inference_("Learner._run_batched_inference"):::added
Learner_run_inference_("Learner._run_inference"):::added
build_openai_compatible_llm_response_("build_openai_compatible_llm_response"):::added
C51ValueHead_forward_("C51ValueHead.forward"):::added
main_ -- "Now starts Learner inference loop before spawning actors." --> Learner_start_
main_ -- "Main now gathers Actor.run_episode tasks for parallel episodes." --> Actor_run_episode_
Actor_run_episode_ -- "Actor requests LLMResponse each step via Learner.request_inference." --> Learner_request_inference_
Learner_run_batched_inference_ -- "Falls back to single-request path when batch size equals one." --> Learner_run_inference_
Learner_run_batched_inference_ -- "Uses helper to construct OpenAI-compatible responses per batch element." --> build_openai_compatible_llm_response_
Learner_run_batched_inference_ -- "Invokes value head on batched hidden states for diagnostics." --> C51ValueHead_forward_
Learner_run_inference_ -- "Wraps decoded generation into OpenAI-style LLMResponse via helper." --> build_openai_compatible_llm_response_
Learner_run_inference_ -- "Runs value head on last-token hidden states to estimate value." --> C51ValueHead_forward_
classDef added stroke:#15AA7A
classDef removed stroke:#CD5270
classDef modified stroke:#EDAC4C
linkStyle default stroke:#CBD5E1,font-size:13px
Loading

Adds a minimal SeedRL-style architecture example for the ares module's LLM integration, demonstrating a centralized Learner for batched inference and distributed Actors for running SWE-bench environment episodes. Introduces a new helper function, build_openai_compatible_llm_response, to standardize LLM response creation across examples and components.

TopicDetails
SeedRL Example Implements a minimal SeedRL-style architecture for the SWE-bench environment, featuring a centralized Learner for batched LLM inference and distributed Actors to run environment episodes. This includes a new helper function build_openai_compatible_llm_response to standardize LLM responses, which is also retrofitted into an existing example.
Modified files (3)
  • examples/02_local_llm.py
  • examples/03_seedrl_swebench.py
  • src/ares/llms/llm_clients.py
Latest Contributors(1)
UserCommitDate
joshua.greaves@gmail.comAdd-example-2-local-LL...January 09, 2026
Coding Guidelines Adds a new Claude.md document outlining coding guidelines for the ARES codebase, focusing on commenting philosophy, general coding practices, and project-specific patterns.
Modified files (1)
  • Claude.md
Latest Contributors(0)
UserCommitDate
This pull request is reviewed by Baz. Review like a pro on (Baz).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Comment on lines 324 to 330
response = llm_clients.LLMResponse(
chat_completion_response=openai.types.chat.chat_completion.ChatCompletion(
id=str(uuid.uuid4()),
choices=[
openai.types.chat.chat_completion.Choice(
message=openai.types.chat.chat_completion_message.ChatCompletionMessage(
content=output_text,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLMResponse construction (lines 324‑347) is repeated again later (465‑488). Could we pull that into a helper to keep inference logic DRY?

Prompt for AI Agents:

In examples/03_seedrl_swebench.py around lines 324-347 and 465-488, the LLMResponse
construction is duplicated inside the _run_inference and _run_batched_inference logic.
Refactor by adding a new private helper method on Learner (e.g., def
_build_llm_response(self, output_text: str, num_input_tokens: int, num_output_tokens:
int) -> llm_clients.LLMResponse) that builds and returns the llm_clients.LLMResponse
using self.model_name and uuid/time as currently done. Replace the inline response
construction at lines 324-347 and 465-488 with calls to this helper, passing the
calculated output_text and token counts, and update any variable names accordingly to
remove duplication.

Fix in Cursor


Finding type: Conciseness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s make a helper function inside ARES to construct LLM requests easily, and use it here and in example 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it—that’s a helpful pattern; I’ll remember to suggest a shared ARES helper for constructing LLM requests in similar situations after this PR is merged.

Comment on lines +413 to +425
# Get generation parameters from requests (use first request's params)
temperature = batch[0].request.temperature or 1.0
do_sample = True if batch[0].request.temperature else False

# Generate completions for the entire batch
with torch.no_grad():
outputs = self.model.generate(
input_ids=batched_input_ids,
attention_mask=batched_attention_mask,
max_new_tokens=2048,
temperature=temperature,
do_sample=do_sample,
pad_token_id=self.tokenizer.pad_token_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batched inference path always pulls temperature/do_sample from batch[0].request, so any subsequent request in the same batch silently inherits the first actor’s generation settings even though LLMRequest exposes per-request temperatures. This breaks the public contract that callers can specify their own sampling parameters, so separate temperatures should either trigger per-request batches or per-element handling before responding.


Finding type: Breaking Changes

Rowan and others added 2 commits January 13, 2026 17:28
Addresses PR #10 review feedback:

- Add build_openai_compatible_llm_response() helper in llm_clients.py
  with full type hints and docstring for constructing LLMResponse from
  raw generation outputs
- Refactor examples/02_local_llm.py to use the new helper, removing
  duplicate OpenAI object construction boilerplate
- Refactor examples/03_seedrl_swebench.py to use the new helper in both
  single and batched inference paths
- Fix batched inference contract: group requests by generation parameters
  (temperature, do_sample) and process each group separately to ensure
  parameters are respected per request while preserving original order
- Pass compileall and ruff linter checks

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Comment on lines +286 to +287
temperature=req.request.temperature or 1.0,
do_sample=True if req.request.temperature else False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the temperature is derived as req.request.temperature or 1.0, any request that explicitly requests temperature=0.0 (e.g., a deterministic greedy policy) is silently converted to 1.0 even though do_sample remains False, so the caller never receives the requested hyperparameter. The same or 1.0 pattern is reused when batching/grouping, so every 0.0-temperature request will run at 1.0 instead of the behavior the caller asked for.

Suggested change
temperature=req.request.temperature or 1.0,
do_sample=True if req.request.temperature else False,
temperature=1.0 if req.request.temperature is None else req.request.temperature,
do_sample=False if req.request.temperature == 0.0 else True,

Finding type: Logical Bugs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant