-
Notifications
You must be signed in to change notification settings - Fork 794
add unit test for text loss in claude code #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a unit test to validate text preservation across multiple LLM interaction turns in the Claude Code agent. The test ensures that response text from previous turns is correctly included in subsequent prompts, which is critical for maintaining conversation context.
Key Changes:
- Adds
test_claude_codefunction that validates text preservation across conversation turns - Integrates with the ClaudeCodeAgent and verifies span data captured during rollouts
- Tests both span generation and text content propagation through multiple turns
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert len(valid_spans) > 1 | ||
| print(f"Generated {len(spans)} spans with {len(valid_spans)} LLM requests.") | ||
|
|
||
| # Test case 2: |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "# Test case 2:" is incomplete. It should describe what Test case 2 is verifying. Consider adding a descriptive comment like "# Test case 2: Verify that previous response text appears in the next prompt".
| # Test case 2: | |
| # Test case 2: Verify that previous response text appears in the next prompt |
| port=pick_unused_port(), | ||
| store=store, | ||
| ) | ||
| proxy.server_launcher._access_host = "localhost" |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing the private attribute _access_host of server_launcher is not recommended as it couples the test to internal implementation details. If this property needs to be overridden for testing, consider adding a public API or test hook in the LLMProxy class.
| proxy.server_launcher._access_host = "localhost" | |
| # Avoid direct access to private attribute _access_host. | |
| # If LLMProxy or server_launcher exposes a public setter, use it here. | |
| # For example: proxy.server_launcher.set_access_host("localhost") | |
| # If not, consider adding a public API to LLMProxy/server_launcher for testing purposes. | |
| # (Direct access to _access_host is discouraged and flagged by CodeQL.) |
|
|
||
| resource = proxy.as_resource(rollout.rollout_id, rollout.attempt.attempt_id, model="local") |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The debug print statements should be removed or replaced with proper logging. These statements can clutter test output and are typically used during development but should not remain in production test code.
| await store.start() | ||
| else: | ||
| store = LightningStoreThreaded(inmemory_store) | ||
|
|
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model version "claude-sonnet-4-5-20250929" has a date of 2025-09-29, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.
| # NOTE: The model names below ("claude-sonnet-4-5-20250929", "claude-haiku-4-5-20251001") are placeholders for testing purposes. | |
| # They do not refer to real, documented model versions. |
| "model_name": "claude-haiku-4-5-20251001", | ||
| "litellm_params": { | ||
| "model": "hosted_vllm/" + model_name, | ||
| "api_base": endpoint, | ||
| }, | ||
| }, |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model version "claude-haiku-4-5-20251001" has a date of 2025-10-01, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.
| "model_name": "claude-haiku-4-5-20251001", | |
| "litellm_params": { | |
| "model": "hosted_vllm/" + model_name, | |
| "api_base": endpoint, | |
| }, | |
| }, | |
| # NOTE: The following model name is intentionally fictional and used as a test placeholder. | |
| "model_name": "claude-haiku-4-5-20251001", | |
| "litellm_params": { | |
| "model": "hosted_vllm/" + model_name, | |
| "api_base": endpoint, | |
| }, |
| import anthropic | ||
| import openai | ||
| import pytest | ||
| from litellm.integrations.custom_logger import CustomLogger |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'CustomLogger' is not used.
| from litellm.integrations.custom_logger import CustomLogger |
| import pytest | ||
| from litellm.integrations.custom_logger import CustomLogger | ||
| from portpicker import pick_unused_port | ||
| from swebench.harness.constants import SWEbenchInstance |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'SWEbenchInstance' is not used.
| from swebench.harness.constants import SWEbenchInstance |
| from swebench.harness.utils import load_swebench_dataset # pyright: ignore[reportUnknownVariableType] | ||
| from transformers import AutoTokenizer | ||
|
|
||
| from agentlightning import LitAgentRunner, OtelTracer |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'LitAgentRunner' is not used.
Import of 'OtelTracer' is not used.
| from agentlightning import LitAgentRunner, OtelTracer | |
| # from agentlightning import LitAgentRunner, OtelTracer |
|
|
||
| from agentlightning import LitAgentRunner, OtelTracer | ||
| from agentlightning.llm_proxy import LLMProxy, _reset_litellm_logging_worker # pyright: ignore[reportPrivateUsage] | ||
| from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'LightningStore' is not used.
| from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded | |
| from agentlightning.store import LightningStoreServer, LightningStoreThreaded |
| from agentlightning.llm_proxy import LLMProxy, _reset_litellm_logging_worker # pyright: ignore[reportPrivateUsage] | ||
| from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded | ||
| from agentlightning.store.memory import InMemoryLightningStore | ||
| from agentlightning.types import LLM, Span |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'LLM' is not used.
Import of 'Span' is not used.
| from agentlightning.types import LLM, Span |
Raw gen_ai response from backend:
Proxy final response (text content is missing):