Skip to content

Conversation

@hassiebp
Copy link
Contributor

@hassiebp hassiebp commented Jan 12, 2026

Disclaimer: Experimental PR review

Greptile Overview

Greptile Summary

This PR refactors the experiment evaluation workflow to ensure that evaluators run within the root experiment span context, enabling proper tracing and attribution of evaluation activities.

Key Changes:

  1. Evaluator Context Placement: Moved the evaluator execution code (lines 2894-2988) from outside the span context into the with self.start_as_current_span(name=span_name) as span: block. This ensures evaluators are associated with the experiment span.

  2. Propagated Attributes Refactoring: Extracted PropagatedExperimentAttributes into a variable (propagated_experiment_attributes) that can be reused across multiple context managers, including during task execution and evaluator runs.

  3. Wrapped Evaluator Calls: Added _propagate_attributes context managers around both regular evaluator calls (line 2906-2916) and composite evaluator calls (line 2943-2977), ensuring experiment attributes are properly propagated through the evaluation chain.

Behavior Preservation:

  • Exception handling remains unchanged: if the task fails, the exception is caught, the span is updated with error status, and the exception is re-raised, preventing evaluators from running
  • The return statement now executes within the span context instead of after it closes
  • All variable scopes remain valid as evaluators only run after successful task completion

Impact:

This change improves observability by ensuring evaluation traces are properly nested under the experiment span, making it easier to analyze experiment runs in the Langfuse UI.

Confidence Score: 5/5

  • Safe to merge - clean refactoring with no breaking changes or bugs
  • The changes are a straightforward refactoring that moves evaluator execution inside the span context. All variable scopes are correct, exception handling is preserved, and the behavior remains identical except for the intended change (evaluators now run within span context). The test file change is just whitespace cleanup.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
langfuse/_client/client.py 5/5 Refactored experiment evaluators to run within the root experiment span context by moving evaluation code inside the span's with-block and adding _propagate_attributes context managers
tests/test_prompt.py 5/5 Removed trailing whitespace from import statement (formatting fix)

Sequence Diagram

sequenceDiagram
    participant Caller
    participant _process_experiment_item
    participant Span as Span Context
    participant Task as _run_task
    participant Evaluator as _run_evaluator
    participant API as create_score

    Caller->>_process_experiment_item: Process experiment item
    _process_experiment_item->>Span: Enter span context
    activate Span
    
    Note over _process_experiment_item: Try block starts
    _process_experiment_item->>_process_experiment_item: Extract input_data, expected_output
    _process_experiment_item->>_process_experiment_item: Create propagated_experiment_attributes
    
    _process_experiment_item->>Task: Run task with _propagate_attributes
    activate Task
    Task-->>_process_experiment_item: Return output
    deactivate Task
    
    _process_experiment_item->>Span: Update span with input/output
    Note over _process_experiment_item: Try block succeeds
    
    Note over _process_experiment_item,Evaluator: Evaluators run INSIDE span context (NEW)
    loop For each evaluator
        _process_experiment_item->>Evaluator: Run evaluator with _propagate_attributes
        activate Evaluator
        Evaluator-->>_process_experiment_item: Return eval_results
        deactivate Evaluator
        
        loop For each evaluation
            _process_experiment_item->>API: create_score(trace_id, observation_id, ...)
        end
    end
    
    alt If composite_evaluator exists
        _process_experiment_item->>Evaluator: Run composite evaluator with _propagate_attributes
        activate Evaluator
        Evaluator-->>_process_experiment_item: Return composite results
        deactivate Evaluator
        
        loop For each composite evaluation
            _process_experiment_item->>API: create_score(trace_id, observation_id, ...)
        end
    end
    
    _process_experiment_item->>Span: Exit span context
    deactivate Span
    _process_experiment_item-->>Caller: Return ExperimentItemResult
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@hassiebp hassiebp merged commit 898a10c into main Jan 12, 2026
12 checks passed
@hassiebp hassiebp deleted the hassieb/lfe-8152-evaluator-traces-not-linked-to-dataset-item-runs-when-using branch January 12, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants