Skip to content

Conversation

@Chibionos
Copy link
Contributor

Summary

  • Extracts evaluation reporting logic into dedicated modules for better code organization, reusability, and separation of concerns
  • Adds EvalTracingManager class that encapsulates OpenTelemetry tracing logic for evaluation runs
  • Adds _payload_builders package with abstract base class and concrete implementations for coded and legacy evaluations

Why This Refactoring Was Needed

The StudioWebProgressReporter class had grown to over 1200 lines with significant code duplication between coded and legacy evaluation handling. This refactoring:

  1. Improves maintainability: Separates concerns into focused modules - tracing logic in _eval_tracing.py and payload building in _payload_builders/

  2. Enables reusability: The new abstractions can be used independently for:

    • Building evaluation payloads for different backends
    • Managing OpenTelemetry tracing for evaluation runs
    • Extracting usage metrics from spans
  3. Reduces duplication: Shared utilities like GUID conversion, usage extraction, and completion metrics building are now in a single base class

  4. Facilitates testing: Smaller, focused classes are easier to unit test in isolation

New Modules

_eval_tracing.py

  • EvalTracingManager: Manages OpenTelemetry tracing for evaluation runs including parent trace creation, eval run traces, and evaluator span management

_payload_builders/

  • BasePayloadBuilder: Abstract base class with shared utilities for GUID conversion, usage extraction from spans, completion metrics, and request spec building
  • CodedPayloadBuilder: Handles coded agent evaluation payloads with string IDs and /coded/ endpoint suffix
  • LegacyPayloadBuilder: Handles legacy (low-code) agent payloads with GUID conversion and assertionRuns format

Test plan

  • Verify lint checks pass (uv run just lint)
  • Verify type checks pass (uv run mypy src/uipath/_cli/_evals/)
  • Verify build succeeds (uv run just build)
  • Run evaluation tests to ensure no regression

🤖 Generated with Claude Code

Extract evaluation reporting logic into dedicated modules for better
code organization, reusability, and separation of concerns:

- Add _eval_tracing.py: EvalTracingManager class that encapsulates all
  OpenTelemetry tracing logic for evaluation runs including parent trace
  creation, eval run traces, and evaluator span management

- Add _payload_builders package:
  - BasePayloadBuilder: Abstract base class with shared utilities for
    GUID conversion, usage extraction from spans, completion metrics,
    and request spec building
  - CodedPayloadBuilder: Handles coded agent evaluation payloads with
    string IDs and /coded/ endpoint suffix
  - LegacyPayloadBuilder: Handles legacy (low-code) agent payloads with
    GUID conversion and assertionRuns format

These modules provide reusable abstractions that can be used to simplify
the StudioWebProgressReporter and enable easier testing and maintenance
of the evaluation reporting functionality.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant