Skip to content

Conversation

@Jagriti-student
Copy link
Contributor

@Jagriti-student Jagriti-student commented Jan 12, 2026

Summary

This PR adds a to_csv() method to SuiteResult in src/agentunit/reporting/results.py.
It allows exporting all test results to a CSV file, flattening nested metrics into separate columns
(e.g., metric_ExactMatch, metric_Latency) for easy analysis in spreadsheets.

Changes

  • Added _flatten_metrics() helper (already existing but improved if needed)
  • Added SuiteResult.to_csv(path: str | Path) method
  • CSV writes one row per scenario run

Testing

Verified manually using:

from datetime import datetime
from agentunit.reporting.results import SuiteResult, ScenarioResult, ScenarioRun

run = ScenarioRun(
    scenario_name="DemoScenario",
    case_id="case_1",
    success=True,
    metrics={"ExactMatch": 1.0, "Latency": 120.5},
    duration_ms=150.0,
    trace=None,
)

scenario = ScenarioResult(name="DemoScenario")
scenario.add_run(run)

suite = SuiteResult(
    scenarios=[scenario],
    started_at=datetime.now(),
    finished_at=datetime.now(),
)

suite.to_csv("results/test.csv")

Closes #58  

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **New Features**
  * CSV export for test results, including scenario, case ID, success, duration, error, and flattened metrics.

* **Improvements**
  * Reorganized JUnit output and improved time reporting accuracy.
  * Tighter Markdown and JSON formatting for exported reports.

* **Maintenance**
  * Removed numpy dependency and added an "integration-tests" extras entry; minor dependency reordering and formatting cleanup.

<sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

@continue
Copy link

continue bot commented Jan 12, 2026

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

Warning

Rate limit exceeded

@Jagriti-student has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 16 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between fc43108 and 637bb2d.

📒 Files selected for processing (1)
  • src/agentunit/reporting/results.py

Walkthrough

Adds CSV export to SuiteResult with metric flattening, refactors JUnit output to a single testsuite and adjusts time reporting, enhances Markdown rendering helpers, and updates pyproject.toml (removes numpy, reorders dependencies, adds integration-tests extra).

Changes

Cohort / File(s) Summary
Configuration
pyproject.toml
Removed numpy dependency, removed inline comment from langchain line, reordered jsonschema within dependencies, and added integration-tests = ["langgraph"] under [tool.poetry.extras].
Reporting / Exports
src/agentunit/reporting/results.py
Added `SuiteResult.to_csv(path: str
Tests / Minor edits
tests/test_reporting.py
Minor import/comment and whitespace cleanup in test file.

Sequence Diagram(s)

(omitted — changes are internal export additions and formatting; no multi-component sequential flow requiring visualization)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • aviralgarg05
🚥 Pre-merge checks | ✅ 2 | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes to pyproject.toml (numpy removal, jsonschema reorder, langchain comment removal, integration-tests extras) and test file formatting are out of scope relative to issue #58's CSV export objective. Remove the pyproject.toml changes (unrelated dependency management) and test file formatting changes to keep the PR focused on CSV export functionality per issue #58.
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description includes a summary, changes, and testing verification but lacks the detailed template structure with Type of Change, Testing checklist, Code Quality checks, and other required sections. Provide a more complete description using the template: include Type of Change, Testing checklist, Code Quality verification, and Documentation updates to align with repository standards.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly addresses the main change (CSV export) but contains awkward phrasing with duplication ('Add CSV export support for SuiteResultAdd csv reports').
Linked Issues check ✅ Passed The PR successfully implements all requirements from issue #58: adds to_csv() method, flattens nested metrics, and writes one row per run with proper CSV structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Jan 12, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 11.76471% with 30 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/agentunit/reporting/results.py 11.76% 30 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/agentunit/reporting/results.py (2)

83-95: One-level flattening is intentional but should be documented.

The helper only flattens one level of nesting. If metrics contains deeper structures like {"outer": {"inner": {"deep": 1}}}, the inner dict becomes a cell value (e.g., "{'deep': 1}"), which may not be spreadsheet-friendly.

If deeper nesting is a realistic scenario, consider making this recursive or adding a docstring clarifying the depth limit.


217-219: Column order is alphabetical — consider prioritizing core columns.

Sorting fieldnames alphabetically places case_id before scenario_name and interleaves metric_* columns unpredictably. For spreadsheet usability, leading with fixed columns (scenario_name, case_id, success, duration_ms, error) followed by sorted metric columns is more intuitive.

Proposed fix
-        fieldnames = sorted(
-            {key for row in rows for key in row.keys()}
-        )
+        base_fields = ["scenario_name", "case_id", "success", "duration_ms", "error"]
+        metric_fields = sorted(
+            {key for row in rows for key in row.keys()} - set(base_fields)
+        )
+        fieldnames = base_fields + metric_fields
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40bc42d and 580323d.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • pyproject.toml
  • src/agentunit/reporting/results.py
  • tests/test_reporting.py
💤 Files with no reviewable changes (1)
  • tests/test_reporting.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/reporting/results.py (3)
src/agentunit/reporting/html.py (1)
  • render_html_report (11-102)
src/agentunit/core/runner.py (1)
  • run (45-54)
src/agentunit/datasets/base.py (1)
  • name (38-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test (Python 3.12)
  • GitHub Check: Test (Python 3.10)
🔇 Additional comments (4)
pyproject.toml (1)

39-41: LGTM — extras group aligns with test markers.

The new integration-tests extras entry correctly groups langgraph for optional testing scenarios, matching the pytest marker defined at line 70.

src/agentunit/reporting/results.py (3)

133-177: LGTM — JUnit structure is now a single testsuite element.

The refactored XML uses a single <testsuite> with nested <testcase> elements, which is a more standard JUnit format. Time reporting and failure message handling are implemented correctly.


189-226: CSV export implementation looks solid overall.

The method correctly:

  • Creates parent directories
  • Flattens metrics for spreadsheet compatibility
  • Uses DictWriter with proper encoding and newline handling

Minor improvements noted in separate comments regarding column ordering and empty-file behavior.


268-272: No changes needed—the type annotation is correct.

ScenarioRun.metrics is properly typed as dict[str, float | None], which is consistent with the .2f format used in the markdown rendering. The _flatten_metrics function handles nested dicts, but it's used only in CSV export (line 210), not in the markdown rendering at lines 268-272. The markdown code correctly assumes float or None values.

Likely an incorrect or invalid review comment.

Comment on lines +214 to +215
if not rows:
return target
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent behavior: empty suite doesn't create file.

When there are no runs, this returns early without writing any file. Other export methods (to_json, to_markdown, to_junit) always create the output file even when empty. This inconsistency could surprise callers who expect the file to exist after a successful call.

Proposed fix: create empty CSV with headers or at minimum an empty file
         if not rows:
+            target.touch()
             return target

Or, to maintain header consistency for empty results, define a fixed set of base fieldnames.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not rows:
return target
if not rows:
target.touch()
return target

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @src/agentunit/reporting/results.py:
- Line 206: The comprehension building fieldnames uses row.keys (a method
object) instead of calling it; update the expression to call row.keys() so it
reads: fieldnames = sorted({key for row in rows for key in row.keys()}) to avoid
TypeError and correctly collect keys from each row in rows.
🧹 Nitpick comments (1)
src/agentunit/reporting/results.py (1)

203-204: Consider documenting the empty-suite behavior.

When there are no runs, the method returns without creating a file. This is reasonable since fieldnames are derived from row data, but callers might expect a file to exist. Consider documenting this in the docstring.

📝 Suggested docstring update
     def to_csv(self, path: str | Path) -> Path:
         """
         Export suite results to CSV.
         One row per scenario run.
+
+        Note: If there are no scenario runs, no file is created.
         """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 580323d and fc43108.

📒 Files selected for processing (1)
  • src/agentunit/reporting/results.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/agentunit/reporting/results.py (1)
src/agentunit/core/runner.py (1)
  • run (45-54)
🔇 Additional comments (5)
src/agentunit/reporting/results.py (5)

5-5: LGTM!

The new imports (csv module and Any type) are appropriate for the CSV export functionality being added.

Also applies to: 10-10


82-92: Implementation handles single-level nesting only.

The helper correctly flattens one level of nested dictionaries. If metrics contain deeper nesting (e.g., {"a": {"b": {"c": 1}}}), inner dicts would be serialized as strings rather than flattened. This is likely sufficient for current use cases where ScenarioRun.metrics is typed as dict[str, float | None].


116-119: LGTM!

Explicit encoding="utf-8" is good practice for cross-platform consistency.


131-166: LGTM!

The simplified JUnit output with a single testsuite root element is a valid format widely supported by CI tools.


241-261: LGTM!

The explicit type hints and formatting improvements enhance readability without changing behavior.

Copy link
Owner

@aviralgarg05 aviralgarg05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aviralgarg05 aviralgarg05 merged commit 8f9b43d into aviralgarg05:main Jan 12, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants