feat: Add minimal SeedRL-style example for SWE-bench#10
feat: Add minimal SeedRL-style example for SWE-bench#10joshgreaves wants to merge 4 commits intomainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
examples/03_seedrl_swebench.py
Outdated
| response = llm_clients.LLMResponse( | ||
| chat_completion_response=openai.types.chat.chat_completion.ChatCompletion( | ||
| id=str(uuid.uuid4()), | ||
| choices=[ | ||
| openai.types.chat.chat_completion.Choice( | ||
| message=openai.types.chat.chat_completion_message.ChatCompletionMessage( | ||
| content=output_text, |
There was a problem hiding this comment.
The LLMResponse construction (lines 324‑347) is repeated again later (465‑488). Could we pull that into a helper to keep inference logic DRY?
Prompt for AI Agents:
In examples/03_seedrl_swebench.py around lines 324-347 and 465-488, the LLMResponse
construction is duplicated inside the _run_inference and _run_batched_inference logic.
Refactor by adding a new private helper method on Learner (e.g., def
_build_llm_response(self, output_text: str, num_input_tokens: int, num_output_tokens:
int) -> llm_clients.LLMResponse) that builds and returns the llm_clients.LLMResponse
using self.model_name and uuid/time as currently done. Replace the inline response
construction at lines 324-347 and 465-488 with calls to this helper, passing the
calculated output_text and token counts, and update any variable names accordingly to
remove duplication.
Finding type: Conciseness
There was a problem hiding this comment.
Let’s make a helper function inside ARES to construct LLM requests easily, and use it here and in example 2
There was a problem hiding this comment.
Got it—that’s a helpful pattern; I’ll remember to suggest a shared ARES helper for constructing LLM requests in similar situations after this PR is merged.
| # Get generation parameters from requests (use first request's params) | ||
| temperature = batch[0].request.temperature or 1.0 | ||
| do_sample = True if batch[0].request.temperature else False | ||
|
|
||
| # Generate completions for the entire batch | ||
| with torch.no_grad(): | ||
| outputs = self.model.generate( | ||
| input_ids=batched_input_ids, | ||
| attention_mask=batched_attention_mask, | ||
| max_new_tokens=2048, | ||
| temperature=temperature, | ||
| do_sample=do_sample, | ||
| pad_token_id=self.tokenizer.pad_token_id, |
There was a problem hiding this comment.
The batched inference path always pulls temperature/do_sample from batch[0].request, so any subsequent request in the same batch silently inherits the first actor’s generation settings even though LLMRequest exposes per-request temperatures. This breaks the public contract that callers can specify their own sampling parameters, so separate temperatures should either trigger per-request batches or per-element handling before responding.
Finding type: Breaking Changes
Addresses PR #10 review feedback: - Add build_openai_compatible_llm_response() helper in llm_clients.py with full type hints and docstring for constructing LLMResponse from raw generation outputs - Refactor examples/02_local_llm.py to use the new helper, removing duplicate OpenAI object construction boilerplate - Refactor examples/03_seedrl_swebench.py to use the new helper in both single and batched inference paths - Fix batched inference contract: group requests by generation parameters (temperature, do_sample) and process each group separately to ensure parameters are respected per request while preserving original order - Pass compileall and ruff linter checks Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
| temperature=req.request.temperature or 1.0, | ||
| do_sample=True if req.request.temperature else False, |
There was a problem hiding this comment.
Because the temperature is derived as req.request.temperature or 1.0, any request that explicitly requests temperature=0.0 (e.g., a deterministic greedy policy) is silently converted to 1.0 even though do_sample remains False, so the caller never receives the requested hyperparameter. The same or 1.0 pattern is reused when batching/grouping, so every 0.0-temperature request will run at 1.0 instead of the behavior the caller asked for.
| temperature=req.request.temperature or 1.0, | |
| do_sample=True if req.request.temperature else False, | |
| temperature=1.0 if req.request.temperature is None else req.request.temperature, | |
| do_sample=False if req.request.temperature == 0.0 else True, |
Finding type: Logical Bugs
Generated description
graph LR main_("main"):::added Learner_start_("Learner.start"):::added Actor_run_episode_("Actor.run_episode"):::added Learner_request_inference_("Learner.request_inference"):::added Learner_run_batched_inference_("Learner._run_batched_inference"):::added Learner_run_inference_("Learner._run_inference"):::added build_openai_compatible_llm_response_("build_openai_compatible_llm_response"):::added C51ValueHead_forward_("C51ValueHead.forward"):::added main_ -- "Now starts Learner inference loop before spawning actors." --> Learner_start_ main_ -- "Main now gathers Actor.run_episode tasks for parallel episodes." --> Actor_run_episode_ Actor_run_episode_ -- "Actor requests LLMResponse each step via Learner.request_inference." --> Learner_request_inference_ Learner_run_batched_inference_ -- "Falls back to single-request path when batch size equals one." --> Learner_run_inference_ Learner_run_batched_inference_ -- "Uses helper to construct OpenAI-compatible responses per batch element." --> build_openai_compatible_llm_response_ Learner_run_batched_inference_ -- "Invokes value head on batched hidden states for diagnostics." --> C51ValueHead_forward_ Learner_run_inference_ -- "Wraps decoded generation into OpenAI-style LLMResponse via helper." --> build_openai_compatible_llm_response_ Learner_run_inference_ -- "Runs value head on last-token hidden states to estimate value." --> C51ValueHead_forward_ classDef added stroke:#15AA7A classDef removed stroke:#CD5270 classDef modified stroke:#EDAC4C linkStyle default stroke:#CBD5E1,font-size:13pxAdds a minimal SeedRL-style architecture example for the
aresmodule's LLM integration, demonstrating a centralizedLearnerfor batched inference and distributedActors for running SWE-bench environment episodes. Introduces a new helper function,build_openai_compatible_llm_response, to standardize LLM response creation across examples and components.Learnerfor batched LLM inference and distributedActors to run environment episodes. This includes a new helper functionbuild_openai_compatible_llm_responseto standardize LLM responses, which is also retrofitted into an existing example.Modified files (3)
Latest Contributors(1)
Claude.mddocument outlining coding guidelines for the ARES codebase, focusing on commenting philosophy, general coding practices, and project-specific patterns.Modified files (1)
Latest Contributors(0)