Skip to content

Conversation

@Chibionos
Copy link
Contributor

@Chibionos Chibionos commented Dec 19, 2025

Summary

  • Fix legacy evaluation reporting (HTTP 400 errors)
  • Implement Strategy Pattern for legacy vs coded eval flows
  • Refactor into modular _reporting/ package
  • Add logging for eval set run schema reporting
  • Bump version to 2.2.37

Changes

  • _reporting/_strategies.py - Protocol + strategy implementations
  • _reporting/_reporter.py - Main StudioWebProgressReporter class with logging
  • _reporting/_utils.py - Error handling decorator
  • Backward compatibility maintained via re-exports

Logging Added

  • INFO-level logging when creating eval set runs showing inputSchema and outputSchema
  • DEBUG-level logging for full payloads on all eval reporting operations
  • WARNING when entrypoint is not provided, falling back to empty schemas

Tests

  • 33 tests for reporter (including new agent snapshot extraction tests)
  • All lint and format checks passing

🤖 Generated with Claude Code

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Dec 19, 2025
This PR fixes legacy evaluation reporting to the backend that was returning
HTTP 400 errors and implements the Strategy Pattern for cleaner code separation.

## Changes

### Strategy Pattern Implementation
- Created `EvalReportingStrategy` Protocol defining the interface for evaluation
  reporting strategies
- Implemented `LegacyEvalReportingStrategy` for legacy evaluations:
  - Converts string IDs to deterministic GUIDs using uuid5
  - Uses endpoints without /coded/ prefix
  - Uses assertionRuns format with assertionSnapshot
- Implemented `CodedEvalReportingStrategy` for coded evaluations:
  - Keeps IDs as strings
  - Uses /coded/ endpoint prefix
  - Uses evaluatorRuns format with evaluationCriterias

### Bug Fixes
- Fixed legacy eval API payload structure for backend compatibility
- Added type assertion for project_id to fix mypy errors
- Removed unused ABC, abstractmethod imports after Protocol migration

### Test Results
- All 27 unit tests passing
- All linting checks (ruff, mypy) passing
- Integration testing with calculator sample: all API calls returning HTTP 200 OK

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Chibionos Chibionos force-pushed the fix/legacy-eval-request-wrapper branch from ab72b22 to c6cd5c3 Compare December 19, 2025 04:40
- Create _reporting/ package with focused modules
- Split strategies, utils, and reporter into separate files
- Maintain backward compatibility via re-exports
- Split tests to match new structure (48 tests, up from 27)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Chibionos Chibionos requested a review from mjnovice December 19, 2025 07:18
Chibi Vikram and others added 2 commits December 18, 2025 23:55
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add INFO-level logging to show inputSchema and outputSchema when
  creating eval set runs for better debugging
- Add DEBUG-level logging for full payloads on all eval reporting operations
- Add warning when entrypoint is not provided, falling back to empty schemas
- Add tests for agent snapshot extraction behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@@ -0,0 +1,418 @@
"""Evaluation reporting strategies for legacy and coded evaluations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we split the strategies into separate files ?

Split the monolithic _strategies.py into separate files for better
code organization:
- _strategy_protocol.py: Protocol definition
- _legacy_strategy.py: Legacy evaluation reporting strategy
- _coded_strategy.py: Coded evaluation reporting strategy
- _strategies.py: Re-exports for backward compatibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants