-
Notifications
You must be signed in to change notification settings - Fork 13
Fix ScenarioRun Duration Tracking in SwarmAdapter #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ScenarioRun Duration Tracking in SwarmAdapter #63
Conversation
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Learn moreAll Green is an AI agent that automatically: ✅ Addresses code review comments ✅ Fixes failing CI checks ✅ Resolves merge conflicts |
WalkthroughThe changes implement duration tracking in the SwarmAdapter by computing Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pyproject.toml (1)
23-37: Confirm numpy is still available or remove direct imports.Direct
numpyimports were found in the codebase:
tests/test_comparison.py:5src/agentunit/production/__init___old.py:13(appears to be a deprecated backup file)While
scipywill bring innumpyas a transitive dependency, the test file has a direct import. Either addnumpyback as an explicit dependency or remove the directnumpyimports and rely solely on scipy's numpy.src/agentunit/adapters/swarm_adapter.py (1)
1-1: Pipeline failure: Ruff format check failed.The CI pipeline indicates that this file needs reformatting. Please run
poetry run ruff format src/agentunit/adapters/swarm_adapter.pyto fix the formatting issues before merging.
🤖 Fix all issues with AI agents
In @src/agentunit/reporting/results.py:
- Around line 203-204: When rows is empty the function returns target without
creating the file; change the early-return branch so it creates the file (either
an empty file or headers-only) before returning—for example, invoke
target.write_text("", encoding="utf-8") (or write header text) in the branch
that currently checks `if not rows:` and then `return target`; locate this in
the results handling code that uses the `rows` variable and `target` to ensure
the file is always created.
🧹 Nitpick comments (3)
src/agentunit/reporting/results.py (2)
82-92:_flatten_metricsonly handles single-level nesting.The helper only flattens one level of nested dictionaries. If
valueis a dict containing another dict, the inner dict will be serialized as a string in the CSV (viaDictWriter's default behavior), which may not be the intended outcome.Consider using recursion if deeper nesting is expected, or document this limitation.
♻️ Optional: Recursive flattening
-def _flatten_metrics(metrics: dict[str, Any], prefix: str = "metric") -> dict[str, Any]: +def _flatten_metrics(metrics: dict[str, Any], prefix: str = "metric") -> dict[str, Any]: + """Flatten nested metric dictionaries recursively.""" flat: dict[str, Any] = {} for key, value in metrics.items(): + full_key = f"{prefix}_{key}" if isinstance(value, dict): - for inner_key, inner_value in value.items(): - flat[f"{prefix}_{key}_{inner_key}"] = inner_value + flat.update(_flatten_metrics(value, full_key)) else: - flat[f"{prefix}_{key}"] = value + flat[full_key] = value return flat
206-211: Consider a more intuitive column ordering.Alphabetical sorting places columns like
case_idbeforescenario_name. For better readability, consider placing key identifying columns first.♻️ Optional: Fixed column order with dynamic metrics
- fieldnames = sorted({key for row in rows for key in row}) + # Fixed columns first, then sorted metric columns + fixed_cols = ["scenario_name", "case_id", "success", "duration_ms", "error"] + metric_cols = sorted({k for row in rows for k in row if k not in fixed_cols}) + fieldnames = fixed_cols + metric_colssrc/agentunit/adapters/swarm_adapter.py (1)
353-369: Consider consolidating local imports.
ScenarioRunandScenarioResultare imported separately at lines 353 and 369. These could be combined into a single import statement for cleaner code.♻️ Consolidate imports
# Create trace log from agentunit.core.trace import TraceLog + from agentunit.reporting.results import ScenarioResult, ScenarioRun trace = TraceLog() trace.record( "session_complete", session_id=session_id, metrics=metrics, ) # Create scenario run - from agentunit.reporting.results import ScenarioRun - scenario_run = ScenarioRun( ... ) # Create result - from agentunit.reporting.results import ScenarioResult - result = ScenarioResult( ... )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
pyproject.tomlsrc/agentunit/adapters/swarm_adapter.pysrc/agentunit/reporting/results.pytests/test_reporting.py
💤 Files with no reviewable changes (1)
- tests/test_reporting.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/adapters/swarm_adapter.py (2)
src/agentunit/core/trace.py (2)
TraceLog(26-78)record(33-34)src/agentunit/reporting/results.py (1)
ScenarioRun(29-36)
🪛 GitHub Actions: CI
src/agentunit/adapters/swarm_adapter.py
[error] 1-1: Ruff format check failed. 1 file would be reformatted: src/agentunit/adapters/swarm_adapter.py. Command: poetry run ruff format --check src/ tests/.
🔇 Additional comments (7)
pyproject.toml (3)
27-27: LGTM on langchain constraint.The version constraint
>=0.0.353,<0.4.0is reasonable and unchanged from before.
78-82: LGTM on pytest configuration.The
-raflag configuration is appropriate for showing test summary information. This appears to be a formatting-only change with no functional impact.
35-41: Extras configuration and dependency versions are correct.The
integration-testsextra correctly references the optionallanggraphdependency, and jsonschema version4.25.1is available on PyPI. The configuration aligns with the PR objectives.src/agentunit/reporting/results.py (1)
100-120: LGTM!The addition of
started_atandfinished_atfields toSuiteResultwith ISO formatting into_dict()andto_json()is well-implemented. These timing metadata fields complement the duration tracking changes in the adapter.src/agentunit/adapters/swarm_adapter.py (3)
17-25: LGTM!Good pattern for handling optional dependencies. The import guard with
HAS_SWARMflag combined with the check in__init__(line 51-56) provides clear error messaging when the dependency is missing.
331-337: LGTM - Correctly implements duration tracking.This change properly addresses the PR objective by calculating
duration_msfrom session timestamps. The conversion from seconds to milliseconds (duration_seconds * 1000) is correct, and the fallback to0.0provides defensive handling for edge cases.
355-366: LGTM!The
ScenarioRuncreation correctly uses the calculatedduration_msvalue (line 364), which is the core fix for this PR. The conditional logic forscenario_nameis appropriately defensive.
aviralgarg05
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the issue
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
|
I have solve the issue. |
aviralgarg05
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!

Currently, ScenarioRun duration is hardcoded to 0.0 in SwarmAdapter, which does not reflect the actual runtime of the scenario.
This PR updates SwarmAdapter.end_session to:
Use the already-calculated duration_seconds metric.
Convert it to milliseconds.
Populate the duration_ms field in ScenarioRun.
This ensures that scenario durations are accurately recorded, improving metrics tracking and reporting consistency.
Closes #54
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.