Skip to content

Conversation

@rmitsch
Copy link
Collaborator

@rmitsch rmitsch commented Dec 26, 2025

Description

Adds evaluation capabilities to predictive tasks.

Related Issues

-

Changes Made

  • Adds evaluation capabilities (precise or with LLM judges) to all predictive tasks.
  • Standardize score instead of confidence in terminology and prompts.

Checklist

  • Tests have been extended to cover changes in functionality
  • Existing and new tests succeed
  • Documentation updated (if applicable)
  • Related issues linked

Screenshots/Examples (if applicable)

@rmitsch rmitsch self-assigned this Dec 26, 2025
@codecov
Copy link

codecov bot commented Dec 26, 2025

Codecov Report

❌ Patch coverage is 87.79221% with 47 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
sieves/tasks/predictive/classification/core.py 83.95% 13 Missing ⚠️
sieves/tasks/predictive/core.py 84.00% 8 Missing ⚠️
sieves/tasks/predictive/ner/core.py 83.33% 5 Missing ⚠️
sieves/tasks/predictive/evaluation.py 80.00% 4 Missing ⚠️
...es/tasks/predictive/information_extraction/core.py 91.83% 4 Missing ⚠️
sieves/tasks/predictive/schemas/classification.py 70.00% 3 Missing ⚠️
sieves/tasks/predictive/sentiment_analysis/core.py 90.00% 3 Missing ⚠️
sieves/pipeline/core.py 83.33% 2 Missing ⚠️
sieves/tasks/predictive/pii_masking/core.py 92.59% 2 Missing ⚠️
...ieves/tasks/predictive/relation_extraction/core.py 92.59% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #247      +/-   ##
==========================================
- Coverage   92.80%   92.70%   -0.11%     
==========================================
  Files          78       79       +1     
  Lines        4087     4413     +326     
==========================================
+ Hits         3793     4091     +298     
- Misses        294      322      +28     
Files with missing lines Coverage Δ
sieves/data/doc.py 96.22% <100.00%> (+0.07%) ⬆️
sieves/tasks/predictive/classification/bridges.py 94.40% <100.00%> (ø)
sieves/tasks/predictive/question_answering/core.py 94.36% <100.00%> (+0.92%) ⬆️
sieves/tasks/predictive/schemas/summarization.py 100.00% <100.00%> (ø)
...ves/tasks/predictive/sentiment_analysis/bridges.py 98.37% <100.00%> (ø)
sieves/tasks/predictive/summarization/core.py 96.61% <100.00%> (+0.69%) ⬆️
sieves/tasks/predictive/translation/core.py 96.61% <100.00%> (+0.69%) ⬆️
sieves/tasks/core.py 96.00% <75.00%> (-1.83%) ⬇️
sieves/pipeline/core.py 92.64% <83.33%> (-0.91%) ⬇️
sieves/tasks/predictive/pii_masking/core.py 95.69% <92.59%> (-1.36%) ⬇️
... and 8 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rmitsch rmitsch marked this pull request as ready for review December 26, 2025 12:55
@rmitsch rmitsch merged commit b17c503 into main Dec 26, 2025
3 checks passed
@rmitsch rmitsch deleted the feature/evaluate branch December 26, 2025 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants