-
Notifications
You must be signed in to change notification settings - Fork 7
Evaluation to Use Config Management #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Warning Rate limit exceeded@avirajsingh7 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 7 minutes and 14 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (8)
WalkthroughRefactors evaluation runs to reference persisted configs: Changes
Sequence DiagramsequenceDiagram
participant Client
participant API as API Route
participant ConfigCrud as ConfigVersionCrud
participant DB as Database
participant EvalCrud as Eval CRUD
Client->>API: POST /evaluate (config_id, config_version, ...)
API->>ConfigCrud: resolve(config_id, config_version)
ConfigCrud->>DB: SELECT config_version WHERE id = config_id AND version = config_version
DB-->>ConfigCrud: config record / not found
alt config not found
ConfigCrud-->>API: raise not-found/error
API-->>Client: HTTP 400 (config resolution error)
else config resolved
ConfigCrud-->>API: return config object
API->>API: extract model from config.completion.params
API->>API: validate provider == OPENAI
API->>EvalCrud: create_evaluation_run(config_id, config_version, model, ...)
EvalCrud->>DB: INSERT evaluation_run (config_id, config_version, model, ...)
DB-->>EvalCrud: created record
EvalCrud-->>API: evaluation_run
API-->>Client: HTTP 200 (evaluation created)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
backend/app/crud/evaluations/embeddings.py (1)
366-367: Misleading comment - update to reflect actual behavior.The comment says "Get embedding model from config" but the code hardcodes the value. Update the comment to accurately describe the implementation.
- # Get embedding model from config (default: text-embedding-3-large) - embedding_model = "text-embedding-3-large" + # Use fixed embedding model (text-embedding-3-large) + embedding_model = "text-embedding-3-large"backend/app/tests/api/routes/test_evaluation.py (1)
524-545: Consider renaming function to match its new purpose.The function
test_start_batch_evaluation_missing_modelwas repurposed to test invalidconfig_idscenarios. The docstring was updated but the function name still references "missing_model". Consider renaming for clarity.- def test_start_batch_evaluation_missing_model(self, client, user_api_key_header): - """Test batch evaluation fails with invalid config_id.""" + def test_start_batch_evaluation_invalid_config_id(self, client, user_api_key_header): + """Test batch evaluation fails with invalid config_id."""backend/app/api/routes/evaluation.py (1)
499-510: Consider validating thatmodelis present in config params.The model is extracted with
.get("model")which returnsNoneif not present. Sincemodelis critical for cost tracking (used increate_langfuse_dataset_run), consider validating its presence and returning an error if missing.# Extract model from config for storage model = config.completion.params.get("model") + if not model: + raise HTTPException( + status_code=400, + detail="Config must specify a 'model' in completion params for evaluation", + ) # Create EvaluationRun record with config referencesbackend/app/crud/evaluations/core.py (1)
15-69: Config-basedcreate_evaluation_runrefactor is correctly implemented; consider loggingmodelfor improved traceability.The refactor from inline config dict to
config_id: UUIDandconfig_version: intis properly implemented throughout:
- The sole call site in
backend/app/api/routes/evaluation.py:503correctly passes all new parameters with the right types (config_idas UUID,config_versionas int,modelextracted from config).- The
EvaluationRunmodel inbackend/app/models/evaluation.pycorrectly defines all three fields with appropriate types and descriptions.- All type hints align with Python 3.11+ guidelines.
One suggested improvement for debugging:
Include
modelin the creation log for better traceability when correlating evaluation runs with model versions:logger.info( f"Created EvaluationRun record: id={eval_run.id}, run_name={run_name}, " - f"config_id={config_id}, config_version={config_version}" + f"config_id={config_id}, config_version={config_version}, model={model}" )Since the model is already extracted at the call site and passed to the function, including it in the log will provide fuller context for operational debugging without any additional cost.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py(1 hunks)backend/app/api/routes/evaluation.py(5 hunks)backend/app/crud/evaluations/core.py(5 hunks)backend/app/crud/evaluations/embeddings.py(1 hunks)backend/app/crud/evaluations/processing.py(1 hunks)backend/app/models/evaluation.py(3 hunks)backend/app/tests/api/routes/test_evaluation.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/api/routes/evaluation.pybackend/app/models/evaluation.pybackend/app/crud/evaluations/embeddings.pybackend/app/tests/api/routes/test_evaluation.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.pybackend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/evaluation.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/evaluation.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.py
🧬 Code graph analysis (2)
backend/app/tests/api/routes/test_evaluation.py (2)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(62-115)backend/app/models/evaluation.py (2)
EvaluationDataset(74-130)EvaluationRun(133-248)
backend/app/crud/evaluations/processing.py (1)
backend/app/crud/evaluations/langfuse.py (1)
create_langfuse_dataset_run(20-163)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/crud/evaluations/processing.py (1)
257-264: LGTM! Clean refactor to use stored model field.The change correctly retrieves the model from
eval_run.modelinstead of extracting it from config. This aligns with the new data model where the model is snapshotted at evaluation creation time.backend/app/models/evaluation.py (1)
148-157: LGTM! Well-structured config reference fields.The new
config_idandconfig_versionfields properly establish the relationship to stored configs with appropriate constraints (ge=1for version). The nullable design allows backward compatibility with existing data.backend/app/api/routes/evaluation.py (1)
478-495: LGTM! Robust config resolution with provider validation.The config resolution flow properly validates that the stored config exists and uses the OPENAI provider. Error handling returns appropriate HTTP 400 responses with descriptive messages.
backend/app/alembic/versions/040_add_config_in_evals_run_table.py
Outdated
Show resolved
Hide resolved
backend/app/alembic/versions/040_add_config_in_evals_run_table.py
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/models/evaluation.py (1)
148-158: AlignEvaluationRuntype hints with nullable DB columns for config fields
config_idandconfig_versionare nullable in the schema but annotated as non-optional types. This can mislead callers and type checkers into assuming they’re always present, even for legacy runs or transitional data.Consider updating the annotations to reflect nullability:
- config_id: UUID = SQLField( + config_id: UUID | None = SQLField( foreign_key="config.id", nullable=True, description="Reference to the stored config used for this evaluation", ) - config_version: int = SQLField( + config_version: int | None = SQLField( nullable=True, ge=1, description="Version of the config used for this evaluation", )This keeps the schema the same while making runtime and type expectations clearer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/models/evaluation.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/models/evaluation.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/evaluation.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/evaluation.py (1)
271-273: Public model nullability now matches the schemaMaking
config_id,config_version, andmodelnullable inEvaluationRunPubliccorrectly reflects the DB fields and avoids validation issues for existing rows. This resolves the earlier mismatch between the table and the public model.
backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py
Outdated
Show resolved
Hide resolved
f5b94b0 to
cdb0b2e
Compare
…ssistant_id handling
…nstead of config dict
…ig_version fields
…ve_model_from_config function, and update processing logic to use config references
c9cc51a to
a2c8a95
Compare
Prajna1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Summary
This change refactors the evaluation run process to utilize a stored configuration instead of a configuration dictionary. It introduces fields for
config_id,config_version, andmodelin the evaluation run table, streamlining the evaluation process and improving data integrity.Checklist
Before submitting a pull request, please ensure that you mark these tasks.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Summary by CodeRabbit
New Features
Behavior Change
Tests
✏️ Tip: You can customize this high-level summary in your review settings.