fix(experiments): move evaluations to root experiment span #1497

hassiebp · 2026-01-12T10:04:13Z

Disclaimer: Experimental PR review

Greptile Overview

Greptile Summary

This PR refactors the experiment evaluation workflow to ensure that evaluators run within the root experiment span context, enabling proper tracing and attribution of evaluation activities.

Key Changes:

Evaluator Context Placement: Moved the evaluator execution code (lines 2894-2988) from outside the span context into the with self.start_as_current_span(name=span_name) as span: block. This ensures evaluators are associated with the experiment span.
Propagated Attributes Refactoring: Extracted PropagatedExperimentAttributes into a variable (propagated_experiment_attributes) that can be reused across multiple context managers, including during task execution and evaluator runs.
Wrapped Evaluator Calls: Added _propagate_attributes context managers around both regular evaluator calls (line 2906-2916) and composite evaluator calls (line 2943-2977), ensuring experiment attributes are properly propagated through the evaluation chain.

Behavior Preservation:

Exception handling remains unchanged: if the task fails, the exception is caught, the span is updated with error status, and the exception is re-raised, preventing evaluators from running
The return statement now executes within the span context instead of after it closes
All variable scopes remain valid as evaluators only run after successful task completion

Impact:

This change improves observability by ensuring evaluation traces are properly nested under the experiment span, making it easier to analyze experiment runs in the Langfuse UI.

Confidence Score: 5/5

Safe to merge - clean refactoring with no breaking changes or bugs
The changes are a straightforward refactoring that moves evaluator execution inside the span context. All variable scopes are correct, exception handling is preserved, and the behavior remains identical except for the intended change (evaluators now run within span context). The test file change is just whitespace cleanup.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
langfuse/_client/client.py	5/5	Refactored experiment evaluators to run within the root experiment span context by moving evaluation code inside the span's with-block and adding _propagate_attributes context managers
tests/test_prompt.py	5/5	Removed trailing whitespace from import statement (formatting fix)

Sequence Diagram

sequenceDiagram
    participant Caller
    participant _process_experiment_item
    participant Span as Span Context
    participant Task as _run_task
    participant Evaluator as _run_evaluator
    participant API as create_score

    Caller->>_process_experiment_item: Process experiment item
    _process_experiment_item->>Span: Enter span context
    activate Span
    
    Note over _process_experiment_item: Try block starts
    _process_experiment_item->>_process_experiment_item: Extract input_data, expected_output
    _process_experiment_item->>_process_experiment_item: Create propagated_experiment_attributes
    
    _process_experiment_item->>Task: Run task with _propagate_attributes
    activate Task
    Task-->>_process_experiment_item: Return output
    deactivate Task
    
    _process_experiment_item->>Span: Update span with input/output
    Note over _process_experiment_item: Try block succeeds
    
    Note over _process_experiment_item,Evaluator: Evaluators run INSIDE span context (NEW)
    loop For each evaluator
        _process_experiment_item->>Evaluator: Run evaluator with _propagate_attributes
        activate Evaluator
        Evaluator-->>_process_experiment_item: Return eval_results
        deactivate Evaluator
        
        loop For each evaluation
            _process_experiment_item->>API: create_score(trace_id, observation_id, ...)
        end
    end
    
    alt If composite_evaluator exists
        _process_experiment_item->>Evaluator: Run composite evaluator with _propagate_attributes
        activate Evaluator
        Evaluator-->>_process_experiment_item: Return composite results
        deactivate Evaluator
        
        loop For each composite evaluation
            _process_experiment_item->>API: create_score(trace_id, observation_id, ...)
        end
    end
    
    _process_experiment_item->>Span: Exit span context
    deactivate Span
    _process_experiment_item-->>Caller: Return ExperimentItemResult

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

fix(experiments): move evaluations to root experiment span

85d427a

hassiebp mentioned this pull request Jan 12, 2026

Evaluator traces not linked to dataset-item runs when using run_experiment langfuse/langfuse#11227

Open

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

hassiebp merged commit 898a10c into main Jan 12, 2026
12 checks passed

hassiebp deleted the hassieb/lfe-8152-evaluator-traces-not-linked-to-dataset-item-runs-when-using branch January 12, 2026 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(experiments): move evaluations to root experiment span #1497

fix(experiments): move evaluations to root experiment span #1497

Uh oh!

hassiebp commented Jan 12, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(experiments): move evaluations to root experiment span #1497

fix(experiments): move evaluations to root experiment span #1497

Uh oh!

Conversation

hassiebp commented Jan 12, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Disclaimer: Experimental PR review

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hassiebp commented Jan 12, 2026 •

edited by greptile-apps bot

Loading