Skip to content

Conversation

@anticorrelator
Copy link
Contributor

@anticorrelator anticorrelator commented Jan 8, 2026

resolves #10456

  • uses display_name for annotations
  • removes unique_constraint on Evaluator table

Note

Aligns evaluator naming across schema, client, and server, using displayName for emitted annotations and errors, and relaxes DB uniqueness on evaluator names.

  • Schema: add displayName to PlaygroundEvaluatorInput; update generated Relay types for mutations/subscriptions
  • Frontend: propagate displayName with evaluator mappings via new EvaluatorMappingEntry; update getChatCompletionOverDatasetInput and table/section components to send and display it
  • Server: use display_name for annotation name and error chunks in chat mutations/subscriptions; pass through to evaluation union responses
  • DB: remove unique constraint from evaluators.name (migration and ORM model); adjust duplicate error messages to dataset-level displayName uniqueness for dataset evaluators

Written by Cursor Bugbot for commit 97c47db. This will update automatically on new commits. Configure here.

@anticorrelator anticorrelator requested review from a team as code owners January 8, 2026 23:28
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Jan 8, 2026
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jan 8, 2026
@anticorrelator anticorrelator changed the base branch from main to version-13 January 8, 2026 23:28
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Jan 8, 2026
if "foreign" in str(e).lower():
raise BadRequest(f"Dataset with id {dataset_id} not found")
raise BadRequest(f"Evaluator with name {input.name} already exists")
raise BadRequest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth renaming this resolve to create_dataset_code_evaluator while we're here.


input_mappings_by_evaluator_node_id = {
evaluator.id: evaluator.input_mapping for evaluator in input.evaluators
evaluator_info_by_node_id = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
evaluator_info_by_node_id = {
evaluator_input_by_node_id = {

Suggestion

"evaluators",
sa.Column("id", _Integer, primary_key=True),
sa.Column("name", sa.String, nullable=False, unique=True),
sa.Column("name", sa.String, nullable=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may eventually want to display global evaluators in an evaluators hub and have them be addressable by name. Rather than relaxing the unique constraint, I would vote to either remove the column entirely for now or generate unique names like we do for the prompt underlying LLM evaluators, e.g., with a hashed suffix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think addressing evaluators globally by name should carry the expectation that the name is unique. You may see multiple "contains" evaluators, and then need to disambiguate them by attached dataset, description, etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, even if we show global evaluators, autogenerated names would be even less informative, in my opinion

@strawberry.input
class PlaygroundEvaluatorInput:
id: GlobalID
display_name: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the id above the DatasetEvaluatorId? If so, don't we already have the display name saved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, it's an EvaluatorID which unifies the preview payloads as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: 📘 Todo

Development

Successfully merging this pull request may close these issues.

[evaluators] custom evaluator names / duplicate evaluators per dataset

4 participants