add unit test for text loss in claude code #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

zxgx wants to merge 2 commits into microsoft:main from zxgx:text_loss

Contributor

zxgx commented Dec 6, 2025 •

edited by ultmaster

Loading

Raw gen_ai response from backend:

Proxy final response (text content is missing):


          add unit test for text loss in claude code

593e956

Copilot AI review requested due to automatic review settings

December 6, 2025 09:42

Copilot started reviewing on behalf of zxgx

December 6, 2025 09:42


          skip test

bc862ad

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

This PR adds a unit test to validate text preservation across multiple LLM interaction turns in the Claude Code agent. The test ensures that response text from previous turns is correctly included in subsequent prompts, which is critical for maintaining conversation context.

Key Changes:

Adds test_claude_code function that validates text preservation across conversation turns
Integrates with the ClaudeCodeAgent and verifies span data captured during rollouts
Tests both span generation and text content propagation through multiple turns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/llm_proxy/test_text_content.py

+                  assert len(valid_spans) > 1
+                  print(f"Generated {len(spans)} spans with {len(valid_spans)} LLM requests.")
+                  # Test case 2:

Copilot AI Dec 6, 2025

The comment "# Test case 2:" is incomplete. It should describe what Test case 2 is verifying. Consider adding a descriptive comment like "# Test case 2: Verify that previous response text appears in the next prompt".

Suggested change

      
                # Test case 2:
          
                # Test case 2: Verify that previous response text appears in the next prompt

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

+                      port=pick_unused_port(),
+                      store=store,
+                  )
+                  proxy.server_launcher._access_host = "localhost"

Copilot AI Dec 6, 2025

Accessing the private attribute _access_host of server_launcher is not recommended as it couples the test to internal implementation details. If this property needs to be overridden for testing, consider adding a public API or test hook in the LLMProxy class.

Suggested change

      
                proxy.server_launcher._access_host = "localhost"
          
                # Avoid direct access to private attribute _access_host.
          
                # If LLMProxy or server_launcher exposes a public setter, use it here.
          
                # For example: proxy.server_launcher.set_access_host("localhost")
          
                # If not, consider adding a public API to LLMProxy/server_launcher for testing purposes.
          
                # (Direct access to _access_host is discouraged and flagged by CodeQL.)

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

Comment on lines +93 to +94


		resource = proxy.as_resource(rollout.rollout_id, rollout.attempt.attempt_id, model="local")

Copilot AI Dec 6, 2025

The debug print statements should be removed or replaced with proper logging. These statements can clutter test output and are typically used during development but should not remain in production test code.

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

+                      await store.start()
+                  else:
+                      store = LightningStoreThreaded(inmemory_store)

Copilot AI Dec 6, 2025

The model version "claude-sonnet-4-5-20250929" has a date of 2025-09-29, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.

Suggested change

      
                # NOTE: The model names below ("claude-sonnet-4-5-20250929", "claude-haiku-4-5-20251001") are placeholders for testing purposes.
          
                # They do not refer to real, documented model versions.

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

Comment on lines +78 to +83

+                              "model_name": "claude-haiku-4-5-20251001",
+                              "litellm_params": {
+                                  "model": "hosted_vllm/" + model_name,
+                                  "api_base": endpoint,
+                              },
+                          },

Copilot AI Dec 6, 2025

The model version "claude-haiku-4-5-20251001" has a date of 2025-10-01, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.

Suggested change

      
                            "model_name": "claude-haiku-4-5-20251001",
          
                            "litellm_params": {
          
                                "model": "hosted_vllm/" + model_name,
          
                                "api_base": endpoint,
          
                            },
          
                        },
          
                            # NOTE: The following model name is intentionally fictional and used as a test placeholder.
          
                            "model_name": "claude-haiku-4-5-20251001",
          
                            "litellm_params": {
          
                                "model": "hosted_vllm/" + model_name,
          
                                "api_base": endpoint,
          
                            },

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

+              import anthropic
+              import openai
+              import pytest
+              from litellm.integrations.custom_logger import CustomLogger

Copilot AI Dec 6, 2025

Import of 'CustomLogger' is not used.

Suggested change

from litellm.integrations.custom_logger import CustomLogger

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

+              import pytest
+              from litellm.integrations.custom_logger import CustomLogger
+              from portpicker import pick_unused_port
+              from swebench.harness.constants import SWEbenchInstance

Copilot AI Dec 6, 2025

Import of 'SWEbenchInstance' is not used.

Suggested change

from swebench.harness.constants import SWEbenchInstance

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

+              from swebench.harness.utils import load_swebench_dataset  # pyright: ignore[reportUnknownVariableType]
+              from transformers import AutoTokenizer
+              from agentlightning import LitAgentRunner, OtelTracer

Copilot AI Dec 6, 2025

Import of 'LitAgentRunner' is not used.
Import of 'OtelTracer' is not used.

Suggested change

      
            from agentlightning import LitAgentRunner, OtelTracer
          
            # from agentlightning import LitAgentRunner, OtelTracer

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

+              from agentlightning import LitAgentRunner, OtelTracer
+              from agentlightning.llm_proxy import LLMProxy, _reset_litellm_logging_worker  # pyright: ignore[reportPrivateUsage]
+              from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded

Copilot AI Dec 6, 2025

Import of 'LightningStore' is not used.

Suggested change

      
            from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded
          
            from agentlightning.store import LightningStoreServer, LightningStoreThreaded

Copilot uses AI. Check for mistakes.

tests/llm_proxy/test_text_content.py

+              from agentlightning.llm_proxy import LLMProxy, _reset_litellm_logging_worker  # pyright: ignore[reportPrivateUsage]
+              from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded
+              from agentlightning.store.memory import InMemoryLightningStore
+              from agentlightning.types import LLM, Span

Copilot AI Dec 6, 2025

Import of 'LLM' is not used.
Import of 'Span' is not used.

Suggested change

from agentlightning.types import LLM, Span

Copilot uses AI. Check for mistakes.

ultmaster mentioned this pull request

[Fix] Missing content in Anthropic to OpenAI conversion BerriAI/litellm#17693

Merged

4 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet