Add Plotly charts to Reports dashboard #66

Jagriti-student · 2026-01-14T12:36:15Z

Summary

Added interactive Plotly visualizations to the Reports section.

Changes

Added bar chart for metric breakdown
Added pie chart for metric contribution
Replaced plain text metrics with visual reports

Closes #57

Summary by CodeRabbit

New Features
- Summary Report dashboard now shows interactive bar and pie charts plus a metric table for Success Rate, Avg Latency, and Total Cost.
- CSV loader accepts customizable delimiters for list fields to better parse tools and context.
Bug Fixes
- Session metrics now compute real durations and per-message response times from timestamps.
- Improved error handling with explicit context for malformed CSV rows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…markdown

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>

…nted

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>

continue · 2026-01-14T12:36:18Z

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts

Unsubscribe from All Green comments

coderabbitai · 2026-01-14T12:36:29Z

Walkthrough

Session metrics now compute real durations and average response times from timestamps; the dashboard adds Plotly-powered charts and a summary table for reports; CSV loader gains configurable delimiters for tools/context, a parsing helper, and stronger row-level error reporting.

Changes

Cohort / File(s)	Summary
Session Metrics Calculation `src/agentunit/adapters/autogen_ag2.py`	`end_session` sets `duration_ms` from timestamp difference; `_calculate_session_metrics` computes `response_times` and `average_response_time` by iterating session interactions (replacing placeholders).
Dashboard Summary Report Visualizations `src/agentunit/dashboard/app.py`	Adds `pandas` and `plotly.express` imports; prepares metrics DataFrame and renders a bar chart, pie chart, and table for the Summary Report (replaces TODO placeholder).
CSV Dataset Loading `src/agentunit/datasets/base.py`	Introduces `_parse_list_field`; `load_local_csv` signature adds `tools_delimiter` and `context_delimiter`; uses helper to parse list fields, wraps row parsing with try/except to raise `AgentUnitError` on malformed rows, and improves missing-file error handling.
Dependencies `pyproject.toml`	Adds `plotly`, `pandas`, and `streamlit` to project dependencies.

Sequence Diagram(s)

(omitted — changes do not introduce a multi-component sequential flow requiring visualization)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Improve robustness of CSV dataset loader #64: Modifies src/agentunit/datasets/base.py with the same _parse_list_field, delimiter parameters, and CSV row error handling changes.

Suggested reviewers

aviralgarg05

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	While most changes align with issue `#57`, the modifications to session metrics calculation in autogen_ag2.py and the load_local_csv method signature changes appear to be out of scope for the dashboard visualization issue.	Clarify whether the changes to autogen_ag2.py and base.py are necessary for the dashboard feature. If not, consider removing them or opening separate issues for those improvements.
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description is minimal but covers key changes. However, it lacks several template sections: Type of Change, detailed Testing, Code Quality checks, and Dependencies justification are not addressed.	Fill in the PR template more completely, especially Type of Change, Testing section with results, Code Quality checks, and explicit Dependencies justification explaining why Plotly/Pandas/Streamlit were necessary.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding Plotly charts to the Reports dashboard. It is concise, clear, and directly reflects the primary objective of the PR.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#57` by integrating Plotly and creating visual charts (bar and pie) for the Reports dashboard, replacing text-only metrics display with interactive visualizations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

🧹 Recent nitpick comments

pyproject.toml (1)
38-40: Consider making dashboard dependencies optional.

The plotly, pandas, and streamlit packages are used exclusively for the dashboard feature (isolated to src/agentunit/dashboard/) and significantly increase install footprint for users who don't need this functionality. Following the existing pattern for ragas and langgraph, make these optional extras.
Suggested refactor to use optional extras
-plotly = "^6.5.1"
-pandas = "^2.3.3"
-streamlit = "^1.52.2"
+plotly = { version = "^6.5.1", optional = true }
+pandas = { version = "^2.3.3", optional = true }
+streamlit = { version = "^1.52.2", optional = true }
Then add a new extras group:
 [tool.poetry.extras]
 ragas = ["ragas"]
 integration-tests = ["langgraph"]
+dashboard = ["plotly", "pandas", "streamlit"]
Users can then install with pip install agentunit[dashboard] when needed.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b889c7 and 6754c24.

⛔ Files ignored due to path filters (1)

poetry.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

pyproject.toml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test (Python 3.12)
GitHub Check: Test (Python 3.10)
GitHub Check: Test (Python 3.10)

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/agentunit/adapters/autogen_ag2.py (2)
391-404: Same attribute mismatch in _calculate_participation_balance.

Line 397 uses interaction.sender_id, which should be interaction.from_agent to match the AgentInteraction class definition.
🐛 Proposed fix
     def _calculate_participation_balance(
         self, interactions: list[AgentInteraction]
     ) -> dict[str, float]:
         """Calculate how balanced participation is across agents."""
         speaker_counts = {}
         for interaction in interactions:
-            speaker_id = interaction.sender_id
+            speaker_id = interaction.from_agent
             speaker_counts[speaker_id] = speaker_counts.get(speaker_id, 0) + 1
406-420: Same attribute mismatch in _analyze_interaction_patterns.

Line 412 uses i.sender_id, which should be i.from_agent.
🐛 Proposed fix
         # Simple pattern analysis
         if len(interactions) > 1:
-            speakers = [i.sender_id for i in interactions]
+            speakers = [i.from_agent for i in interactions]

🤖 Fix all issues with AI agents

In `@src/agentunit/adapters/autogen_ag2.py`:
- Around line 356-365: The unique_speakers calculation currently uses the
nonexistent attribute sender_id on items in session_interactions; update the set
comprehension to use the AgentInteraction attribute from_agent instead (i.e.,
compute len({i.from_agent for i in session_interactions})). Ensure any
references to sender_id in the metrics block (e.g., the unique_speakers key) are
replaced with from_agent so it matches the AgentInteraction definition.

🧹 Nitpick comments (3)

src/agentunit/dashboard/app.py (2)
8-9: Consider guarding pandas and plotly imports like streamlit.

The streamlit import is guarded with a try/except (lines 12-18) to provide a helpful error message if missing. However, pandas and plotly.express are imported unconditionally and will raise ImportError without context if not installed.
♻️ Suggested approach
-import pandas as pd
-import plotly.express as px
+try:
+    import pandas as pd
+    import plotly.express as px
+    HAS_PLOTLY = True
+except ImportError:
+    HAS_PLOTLY = False
+    pd = None
+    px = None
Then check HAS_PLOTLY before rendering charts in _render_reports, or include these in the existing streamlit check.
396-402: Pie chart may not be appropriate for these metrics.

The pie chart shows "Metric Contribution" for Success Rate (%), Avg Latency (s), and Total Cost ($). These metrics have different units and scales with no meaningful relationship as proportions of a whole. A pie chart implies parts summing to 100%, but 95% + 0.5s + $0.01 has no semantic meaning.

Consider either:

Removing the pie chart, or

Using it for a different dataset (e.g., pass/fail distribution, cost breakdown by category)
src/agentunit/adapters/autogen_ag2.py (1)
343-354: Consider filtering response times to actual replies.

The current logic calculates time differences between all consecutive messages, regardless of speaker. In multi-agent scenarios, this may include delays between messages from the same agent (follow-ups) rather than actual response times.

For more accurate metrics, consider only measuring time between messages where from_agent differs:
♻️ Suggested refinement
         response_times = []

         # Go through messages one by one
         for i in range(1, len(session_interactions)):
+            # Only measure response time when speaker changes
+            if session_interactions[i].from_agent == session_interactions[i - 1].from_agent:
+                continue
             previous_message_time = session_interactions[i - 1].timestamp
             current_message_time = session_interactions[i].timestamp

             time_difference = (current_message_time - previous_message_time).total_seconds()

             response_times.append(time_difference)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48cfb0d and 8b889c7.

📒 Files selected for processing (3)

src/agentunit/adapters/autogen_ag2.py
src/agentunit/dashboard/app.py
src/agentunit/datasets/base.py

🧰 Additional context used

🧬 Code graph analysis (2)

src/agentunit/datasets/base.py (1)

src/agentunit/core/exceptions.py (1)

AgentUnitError (8-11)

src/agentunit/dashboard/app.py (1)

src/agentunit/comparison/statistics.py (1)

max (54-55)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test (Python 3.10)
GitHub Check: Test (Python 3.12)

🔇 Additional comments (3)

src/agentunit/datasets/base.py (2)

82-93: LGTM! Clean helper function for parsing delimited fields.

The function handles edge cases well: empty/None values, empty delimiters, and whitespace trimming. The fallback behavior when delimiter is empty (returning the trimmed value as a single-item list) is a sensible defensive choice.

96-133: Good improvement to CSV parsing robustness.

The configurable delimiters add flexibility, and the explicit metadata key exclusion is cleaner than the previous approach. The error handling with row context will make debugging CSV issues much easier.

One minor observation: the broad except Exception at line 131 will catch both malformed data and missing required fields (like query). This is acceptable since the error message includes the row content, but consider whether a more specific message for KeyError (missing required column) would be helpful.

src/agentunit/adapters/autogen_ag2.py (1)

321-321: Good fix: Duration is now calculated from actual timestamps.

The change from a hardcoded 0.0 to computing the actual duration using (end_time - start_time).total_seconds() * 1000 provides meaningful metrics for session analysis.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-14T12:39:57Z

src/agentunit/adapters/autogen_ag2.py

        # Basic metrics
        metrics = {
            "total_messages": len(session_interactions),
            "unique_speakers": len({i.sender_id for i in session_interactions}),
            "duration_seconds": (
                session.get("end_time", datetime.now()) - session["start_time"]
            ).total_seconds(),
-            "average_response_time": 0.0,  # Would need timing data
+            "average_response_time": average_response_time,
            "conversation_turns": session.get("message_count", 0),
        }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd src/agentunit/adapters && wc -l autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 84

🏁 Script executed:

# First, let's find the AgentInteraction class definition grep -n "class AgentInteraction" src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 48

🏁 Script executed:

# Read the relevant sections: AgentInteraction class definition and the usage sed -n '1,100p' src/agentunit/adapters/autogen_ag2.py | head -80

Repository: aviralgarg05/agentunit

Length of output: 2349

🏁 Script executed:

# Check the AgentInteraction class definition more broadly rg -n "class AgentInteraction" -A 20 src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 48

🏁 Script executed:

find . -name "multiagent.py" -o -name "multiagent" -type d

Repository: aviralgarg05/agentunit

Length of output: 92

🏁 Script executed:

# Look for files in the multiagent directory fd -type f ".*\.py" | grep -i multiagent | head -20

Repository: aviralgarg05/agentunit

Length of output: 236

🏁 Script executed:

# Check the structure git ls-files | grep -i multiagent

Repository: aviralgarg05/agentunit

Length of output: 180

🏁 Script executed:

# Read the exact lines from the file to see the actual code sed -n '356,365p' src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 529

🏁 Script executed:

# Also check lines 265-274 for AgentInteraction creation sed -n '265,274p' src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 475

🏁 Script executed:

# Check lines 343-354 for response_times calculation mentioned in scratchpad sed -n '343,354p' src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 580

🏁 Script executed:

# Check the AgentInteraction class in the multiagent module cat -n src/agentunit/multiagent/__init__.py | head -100

Repository: aviralgarg05/agentunit

Length of output: 4400

🏁 Script executed:

# Search for AgentInteraction class definition rg -n "class AgentInteraction" -A 30 src/agentunit/multiagent/

Repository: aviralgarg05/agentunit

Length of output: 2434

Use correct attribute name: from_agent instead of sender_id.

Line 359 accesses i.sender_id, but AgentInteraction objects have a from_agent attribute (defined in src/agentunit/multiagent/__init__.py:179). This will cause an AttributeError at runtime when calculating unique_speakers.

🐛 Proposed fix

metrics = { "total_messages": len(session_interactions), - "unique_speakers": len({i.sender_id for i in session_interactions}), + "unique_speakers": len({i.from_agent for i in session_interactions}), "duration_seconds": ( session.get("end_time", datetime.now()) - session["start_time"] ).total_seconds(), "average_response_time": average_response_time, "conversation_turns": session.get("message_count", 0), }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Basic metrics

metrics = {

"total_messages": len(session_interactions),

"unique_speakers": len({i.sender_id for i in session_interactions}),

"duration_seconds": (

session.get("end_time", datetime.now()) - session["start_time"]

).total_seconds(),

"average_response_time": 0.0, # Would need timing data

"average_response_time": average_response_time,

"conversation_turns": session.get("message_count", 0),

}

# Basic metrics

metrics = {

"total_messages": len(session_interactions),

"unique_speakers": len({i.from_agent for i in session_interactions}),

"duration_seconds": (

session.get("end_time", datetime.now()) - session["start_time"]

).total_seconds(),

"average_response_time": average_response_time,

"conversation_turns": session.get("message_count", 0),

}

🤖 Prompt for AI Agents

In `@src/agentunit/adapters/autogen_ag2.py` around lines 356 - 365, The unique_speakers calculation currently uses the nonexistent attribute sender_id on items in session_interactions; update the set comprehension to use the AgentInteraction attribute from_agent instead (i.e., compute len({i.from_agent for i in session_interactions})). Ensure any references to sender_id in the metrics block (e.g., the unique_speakers key) are replaced with from_agent so it matches the AgentInteraction definition.

coderabbitai · 2026-01-14T12:39:57Z

src/agentunit/dashboard/app.py

+            df = pd.DataFrame(
+                {
+                    "Metric": list(report_metrics.keys()),
+                    "Value": list(report_metrics.values()),
+                }
+            )
+
+            # Bar chart
+            fig_bar = px.bar(
+                df,
+                x="Metric",
+                y="Value",
+                title="Metric Breakdown",
+                text="Value",
+            )
+            st.plotly_chart(fig_bar, use_container_width=True)
+            # Pie chart
+            fig_pie = px.pie(
+                df,
+                names="Metric",
+                values="Value",
+                title="Metric Contribution",
+            )
+            st.plotly_chart(fig_pie, use_container_width=True)
+            # Table view
+            st.dataframe(df)


⚠️ Potential issue | 🔴 Critical

Critical: Indentation error causes code to run outside the if selected_run: block.

The DataFrame creation and chart rendering (lines 379-404) are dedented and will execute even when selected_run is falsy. This will cause a NameError because report_metrics is only defined inside the if selected_run: block (lines 371-377).

🐛 Proposed fix - indent the visualization code

report_metrics = { "Success Rate (%)": ( run_data.get("passed", 0) / max(run_data.get("total", 1), 1) * 100 ), "Avg Latency (s)": run_data.get("avg_latency", 0), "Total Cost ($)": run_data.get("cost", 0), } - df = pd.DataFrame( - { - "Metric": list(report_metrics.keys()), - "Value": list(report_metrics.values()), - } - ) - - # Bar chart - fig_bar = px.bar( - df, - x="Metric", - y="Value", - title="Metric Breakdown", - text="Value", - ) - st.plotly_chart(fig_bar, use_container_width=True) - # Pie chart - fig_pie = px.pie( - df, - names="Metric", - values="Value", - title="Metric Contribution", - ) - st.plotly_chart(fig_pie, use_container_width=True) - # Table view - st.dataframe(df) + df = pd.DataFrame( + { + "Metric": list(report_metrics.keys()), + "Value": list(report_metrics.values()), + } + ) + + # Bar chart + fig_bar = px.bar( + df, + x="Metric", + y="Value", + title="Metric Breakdown", + text="Value", + ) + st.plotly_chart(fig_bar, use_container_width=True) + # Pie chart + fig_pie = px.pie( + df, + names="Metric", + values="Value", + title="Metric Contribution", + ) + st.plotly_chart(fig_pie, use_container_width=True) + # Table view + st.dataframe(df)

codecov-commenter · 2026-01-14T12:50:07Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 6.89655% with 27 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/agentunit/datasets/base.py	0.00%	13 Missing ⚠️
src/agentunit/adapters/autogen_ag2.py	0.00%	7 Missing ⚠️
src/agentunit/dashboard/app.py	22.22%	7 Missing ⚠️

📢 Thoughts on this report? Let us know!

aviralgarg05

LGTM!

Jagriti-student added 30 commits December 7, 2025 14:43

Add basic evaluation example script

b052329

Fix typos and improve clarity in docstrings across core modules

dd2f4fe

Add Google-style docstrings to BaseAdapter methods

80b0706

Format base adapter using ruff

8e7b8c1

docs: add instructions for running CI checks locally

0669ffd

Remove example file unrelated to CI documentation

fe9d27b

Add py.typed marker for type checker support

7e21593

Add test for markdown emoji encoding

e53ab49

Fix test_reporting: correct class usage, fields, and Windows-safe to_…

ac0efa1

…markdown

All tests passing: fixed dependencies and formatting

dd4d11a

Merge branch 'main' into test-markdown-emoji

35b5efc

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>

Merge branch 'main' into test-markdown-emoji

8efa96d

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>

Merge branch 'main' into test-markdown-emoji

ec02604

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>

Merge branch 'main' into test-markdown-emoji

5dd7810

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>

Update dependencies / poetry config

b6b5d9f

Fix emoji markdown test and align ScenarioRun signature

27d79e0

Fix reporting tests and update dependencies

12a7eb4

Fix missing required dependencies (jsonschema, scipy)

d3dd11c

Update all files

b4034a1

Add CSV export support for SuiteResult

580323d

Fix SIM118 linter issue in SuiteResult.to_csv

f899ead

Fix Ruff formatting issues in SuiteResult.to_csv

fc43108

Fix CSV export: iterate over dict keys correctly and pass Ruff lint

637bb2d

Fix: SwarmAdapter imports and end_session duration tracking, fully li…

f303197

…nted

Format files and remove lint error

cc345d5

Fix the issue of csv file

eed13b0

Improve robustness of CSV dataset loader

753c955

Merge branch 'main' into improve-robustness-csv

c8f76e6

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>

Replace parser function

be0d741

Format CSV dataset loader

de69703

Jagriti-student added 2 commits January 14, 2026 16:24

Implement duration and response time tracking in AG2Adapter

dda8a15

Add Plotly charts to metric breakdown in reports

8b889c7

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

Add Streamlit, Plotly, Pandas dependencies for dashboard charts

6754c24

aviralgarg05 approved these changes Jan 14, 2026

View reviewed changes

aviralgarg05 merged commit 8f9a7df into aviralgarg05:main Jan 14, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Plotly charts to Reports dashboard #66

Add Plotly charts to Reports dashboard #66

Jagriti-student commented Jan 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

continue bot commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

codecov-commenter commented Jan 14, 2026

Uh oh!

aviralgarg05 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Plotly charts to Reports dashboard #66

Add Plotly charts to Reports dashboard #66

Conversation

Jagriti-student commented Jan 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Summary by CodeRabbit

Uh oh!

continue bot commented Jan 14, 2026

Uh oh!

coderabbitai bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jan 14, 2026

Codecov Report

Uh oh!

aviralgarg05 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jagriti-student commented Jan 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 14, 2026 •

edited

Loading