fix: legacy evaluation reporting to backend with Strategy Pattern #1040

Chibionos · 2025-12-19T04:30:04Z

Summary

Fix legacy evaluation reporting (HTTP 400 errors)
Implement Strategy Pattern for legacy vs coded eval flows
Refactor into modular _reporting/ package
Add logging for eval set run schema reporting
Bump version to 2.2.37

Changes

_reporting/_strategies.py - Protocol + strategy implementations
_reporting/_reporter.py - Main StudioWebProgressReporter class with logging
_reporting/_utils.py - Error handling decorator
Backward compatibility maintained via re-exports

Logging Added

INFO-level logging when creating eval set runs showing inputSchema and outputSchema
DEBUG-level logging for full payloads on all eval reporting operations
WARNING when entrypoint is not provided, falling back to empty schemas

Tests

33 tests for reporter (including new agent snapshot extraction tests)
All lint and format checks passing

🤖 Generated with Claude Code

This PR fixes legacy evaluation reporting to the backend that was returning HTTP 400 errors and implements the Strategy Pattern for cleaner code separation. ## Changes ### Strategy Pattern Implementation - Created `EvalReportingStrategy` Protocol defining the interface for evaluation reporting strategies - Implemented `LegacyEvalReportingStrategy` for legacy evaluations: - Converts string IDs to deterministic GUIDs using uuid5 - Uses endpoints without /coded/ prefix - Uses assertionRuns format with assertionSnapshot - Implemented `CodedEvalReportingStrategy` for coded evaluations: - Keeps IDs as strings - Uses /coded/ endpoint prefix - Uses evaluatorRuns format with evaluationCriterias ### Bug Fixes - Fixed legacy eval API payload structure for backend compatibility - Added type assertion for project_id to fix mypy errors - Removed unused ABC, abstractmethod imports after Protocol migration ### Test Results - All 27 unit tests passing - All linting checks (ruff, mypy) passing - Integration testing with calculator sample: all API calls returning HTTP 200 OK 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Create _reporting/ package with focused modules - Split strategies, utils, and reporter into separate files - Maintain backward compatibility via re-exports - Split tests to match new structure (48 tests, up from 27) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add INFO-level logging to show inputSchema and outputSchema when creating eval set runs for better debugging - Add DEBUG-level logging for full payloads on all eval reporting operations - Add warning when entrypoint is not provided, falling back to empty schemas - Add tests for agent snapshot extraction behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mjnovice · 2025-12-20T03:06:24Z

src/uipath/_cli/_evals/_reporting/_strategies.py

@@ -0,0 +1,418 @@
+"""Evaluation reporting strategies for legacy and coded evaluations.


can we split the strategies into separate files ?

Split the monolithic _strategies.py into separate files for better code organization: - _strategy_protocol.py: Protocol definition - _legacy_strategy.py: Legacy evaluation reporting strategy - _coded_strategy.py: Coded evaluation reporting strategy - _strategies.py: Re-exports for backward compatibility 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Dec 19, 2025

Chibionos force-pushed the fix/legacy-eval-request-wrapper branch from ab72b22 to c6cd5c3 Compare December 19, 2025 04:40

Chibionos requested a review from mjnovice December 19, 2025 07:18

Chibi Vikram and others added 2 commits December 18, 2025 23:55

chore: bump version to 2.2.37

cd8d342

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mjnovice reviewed Dec 20, 2025

View reviewed changes

mjnovice approved these changes Dec 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: legacy evaluation reporting to backend with Strategy Pattern #1040

fix: legacy evaluation reporting to backend with Strategy Pattern #1040

Uh oh!

Chibionos commented Dec 19, 2025 •

edited

Loading

Uh oh!

mjnovice Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,418 @@
		"""Evaluation reporting strategies for legacy and coded evaluations.

fix: legacy evaluation reporting to backend with Strategy Pattern #1040

Are you sure you want to change the base?

fix: legacy evaluation reporting to backend with Strategy Pattern #1040

Uh oh!

Conversation

Chibionos commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Logging Added

Tests

Uh oh!

mjnovice Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chibionos commented Dec 19, 2025 •

edited

Loading