Skip to content

Conversation

@zxgx
Copy link
Contributor

@zxgx zxgx commented Dec 6, 2025

Raw gen_ai response from backend:

image

Proxy final response (text content is missing):

image

Copilot AI review requested due to automatic review settings December 6, 2025 09:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a unit test to validate text preservation across multiple LLM interaction turns in the Claude Code agent. The test ensures that response text from previous turns is correctly included in subsequent prompts, which is critical for maintaining conversation context.

Key Changes:

  • Adds test_claude_code function that validates text preservation across conversation turns
  • Integrates with the ClaudeCodeAgent and verifies span data captured during rollouts
  • Tests both span generation and text content propagation through multiple turns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert len(valid_spans) > 1
print(f"Generated {len(spans)} spans with {len(valid_spans)} LLM requests.")

# Test case 2:
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "# Test case 2:" is incomplete. It should describe what Test case 2 is verifying. Consider adding a descriptive comment like "# Test case 2: Verify that previous response text appears in the next prompt".

Suggested change
# Test case 2:
# Test case 2: Verify that previous response text appears in the next prompt

Copilot uses AI. Check for mistakes.
port=pick_unused_port(),
store=store,
)
proxy.server_launcher._access_host = "localhost"
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing the private attribute _access_host of server_launcher is not recommended as it couples the test to internal implementation details. If this property needs to be overridden for testing, consider adding a public API or test hook in the LLMProxy class.

Suggested change
proxy.server_launcher._access_host = "localhost"
# Avoid direct access to private attribute _access_host.
# If LLMProxy or server_launcher exposes a public setter, use it here.
# For example: proxy.server_launcher.set_access_host("localhost")
# If not, consider adding a public API to LLMProxy/server_launcher for testing purposes.
# (Direct access to _access_host is discouraged and flagged by CodeQL.)

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +94

resource = proxy.as_resource(rollout.rollout_id, rollout.attempt.attempt_id, model="local")
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug print statements should be removed or replaced with proper logging. These statements can clutter test output and are typically used during development but should not remain in production test code.

Copilot uses AI. Check for mistakes.
await store.start()
else:
store = LightningStoreThreaded(inmemory_store)

Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model version "claude-sonnet-4-5-20250929" has a date of 2025-09-29, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.

Suggested change
# NOTE: The model names below ("claude-sonnet-4-5-20250929", "claude-haiku-4-5-20251001") are placeholders for testing purposes.
# They do not refer to real, documented model versions.

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +83
"model_name": "claude-haiku-4-5-20251001",
"litellm_params": {
"model": "hosted_vllm/" + model_name,
"api_base": endpoint,
},
},
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model version "claude-haiku-4-5-20251001" has a date of 2025-10-01, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.

Suggested change
"model_name": "claude-haiku-4-5-20251001",
"litellm_params": {
"model": "hosted_vllm/" + model_name,
"api_base": endpoint,
},
},
# NOTE: The following model name is intentionally fictional and used as a test placeholder.
"model_name": "claude-haiku-4-5-20251001",
"litellm_params": {
"model": "hosted_vllm/" + model_name,
"api_base": endpoint,
},

Copilot uses AI. Check for mistakes.
import anthropic
import openai
import pytest
from litellm.integrations.custom_logger import CustomLogger
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'CustomLogger' is not used.

Suggested change
from litellm.integrations.custom_logger import CustomLogger

Copilot uses AI. Check for mistakes.
import pytest
from litellm.integrations.custom_logger import CustomLogger
from portpicker import pick_unused_port
from swebench.harness.constants import SWEbenchInstance
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'SWEbenchInstance' is not used.

Suggested change
from swebench.harness.constants import SWEbenchInstance

Copilot uses AI. Check for mistakes.
from swebench.harness.utils import load_swebench_dataset # pyright: ignore[reportUnknownVariableType]
from transformers import AutoTokenizer

from agentlightning import LitAgentRunner, OtelTracer
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'LitAgentRunner' is not used.
Import of 'OtelTracer' is not used.

Suggested change
from agentlightning import LitAgentRunner, OtelTracer
# from agentlightning import LitAgentRunner, OtelTracer

Copilot uses AI. Check for mistakes.

from agentlightning import LitAgentRunner, OtelTracer
from agentlightning.llm_proxy import LLMProxy, _reset_litellm_logging_worker # pyright: ignore[reportPrivateUsage]
from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'LightningStore' is not used.

Suggested change
from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded
from agentlightning.store import LightningStoreServer, LightningStoreThreaded

Copilot uses AI. Check for mistakes.
from agentlightning.llm_proxy import LLMProxy, _reset_litellm_logging_worker # pyright: ignore[reportPrivateUsage]
from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded
from agentlightning.store.memory import InMemoryLightningStore
from agentlightning.types import LLM, Span
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'LLM' is not used.
Import of 'Span' is not used.

Suggested change
from agentlightning.types import LLM, Span

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant