Skip to content

Conversation

@Jagriti-student
Copy link
Contributor

@Jagriti-student Jagriti-student commented Jan 14, 2026

Summary

Added interactive Plotly visualizations to the Reports section.

Changes

  • Added bar chart for metric breakdown
  • Added pie chart for metric contribution
  • Replaced plain text metrics with visual reports

Closes #57

Summary by CodeRabbit

  • New Features

    • Summary Report dashboard now shows interactive bar and pie charts plus a metric table for Success Rate, Avg Latency, and Total Cost.
    • CSV loader accepts customizable delimiters for list fields to better parse tools and context.
  • Bug Fixes

    • Session metrics now compute real durations and per-message response times from timestamps.
    • Improved error handling with explicit context for malformed CSV rows.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
@continue
Copy link

continue bot commented Jan 14, 2026

All Green - Keep your PRs mergeable

Learn more

All Green is an AI agent that automatically:

✅ Addresses code review comments

✅ Fixes failing CI checks

✅ Resolves merge conflicts


Unsubscribe from All Green comments

@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

Walkthrough

Session metrics now compute real durations and average response times from timestamps; the dashboard adds Plotly-powered charts and a summary table for reports; CSV loader gains configurable delimiters for tools/context, a parsing helper, and stronger row-level error reporting.

Changes

Cohort / File(s) Summary
Session Metrics Calculation
src/agentunit/adapters/autogen_ag2.py
end_session sets duration_ms from timestamp difference; _calculate_session_metrics computes response_times and average_response_time by iterating session interactions (replacing placeholders).
Dashboard Summary Report Visualizations
src/agentunit/dashboard/app.py
Adds pandas and plotly.express imports; prepares metrics DataFrame and renders a bar chart, pie chart, and table for the Summary Report (replaces TODO placeholder).
CSV Dataset Loading
src/agentunit/datasets/base.py
Introduces _parse_list_field; load_local_csv signature adds tools_delimiter and context_delimiter; uses helper to parse list fields, wraps row parsing with try/except to raise AgentUnitError on malformed rows, and improves missing-file error handling.
Dependencies
pyproject.toml
Adds plotly, pandas, and streamlit to project dependencies.

Sequence Diagram(s)

(omitted — changes do not introduce a multi-component sequential flow requiring visualization)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • aviralgarg05
🚥 Pre-merge checks | ✅ 2 | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning While most changes align with issue #57, the modifications to session metrics calculation in autogen_ag2.py and the load_local_csv method signature changes appear to be out of scope for the dashboard visualization issue. Clarify whether the changes to autogen_ag2.py and base.py are necessary for the dashboard feature. If not, consider removing them or opening separate issues for those improvements.
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is minimal but covers key changes. However, it lacks several template sections: Type of Change, detailed Testing, Code Quality checks, and Dependencies justification are not addressed. Fill in the PR template more completely, especially Type of Change, Testing section with results, Code Quality checks, and explicit Dependencies justification explaining why Plotly/Pandas/Streamlit were necessary.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding Plotly charts to the Reports dashboard. It is concise, clear, and directly reflects the primary objective of the PR.
Linked Issues check ✅ Passed The PR successfully addresses issue #57 by integrating Plotly and creating visual charts (bar and pie) for the Reports dashboard, replacing text-only metrics display with interactive visualizations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


🧹 Recent nitpick comments
pyproject.toml (1)

38-40: Consider making dashboard dependencies optional.

The plotly, pandas, and streamlit packages are used exclusively for the dashboard feature (isolated to src/agentunit/dashboard/) and significantly increase install footprint for users who don't need this functionality. Following the existing pattern for ragas and langgraph, make these optional extras.

Suggested refactor to use optional extras
-plotly = "^6.5.1"
-pandas = "^2.3.3"
-streamlit = "^1.52.2"
+plotly = { version = "^6.5.1", optional = true }
+pandas = { version = "^2.3.3", optional = true }
+streamlit = { version = "^1.52.2", optional = true }

Then add a new extras group:

 [tool.poetry.extras]
 ragas = ["ragas"]
 integration-tests = ["langgraph"]
+dashboard = ["plotly", "pandas", "streamlit"]

Users can then install with pip install agentunit[dashboard] when needed.


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b889c7 and 6754c24.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test (Python 3.12)
  • GitHub Check: Test (Python 3.10)
  • GitHub Check: Test (Python 3.10)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/agentunit/adapters/autogen_ag2.py (2)

391-404: Same attribute mismatch in _calculate_participation_balance.

Line 397 uses interaction.sender_id, which should be interaction.from_agent to match the AgentInteraction class definition.

🐛 Proposed fix
     def _calculate_participation_balance(
         self, interactions: list[AgentInteraction]
     ) -> dict[str, float]:
         """Calculate how balanced participation is across agents."""
         speaker_counts = {}
         for interaction in interactions:
-            speaker_id = interaction.sender_id
+            speaker_id = interaction.from_agent
             speaker_counts[speaker_id] = speaker_counts.get(speaker_id, 0) + 1

406-420: Same attribute mismatch in _analyze_interaction_patterns.

Line 412 uses i.sender_id, which should be i.from_agent.

🐛 Proposed fix
         # Simple pattern analysis
         if len(interactions) > 1:
-            speakers = [i.sender_id for i in interactions]
+            speakers = [i.from_agent for i in interactions]
🤖 Fix all issues with AI agents
In `@src/agentunit/adapters/autogen_ag2.py`:
- Around line 356-365: The unique_speakers calculation currently uses the
nonexistent attribute sender_id on items in session_interactions; update the set
comprehension to use the AgentInteraction attribute from_agent instead (i.e.,
compute len({i.from_agent for i in session_interactions})). Ensure any
references to sender_id in the metrics block (e.g., the unique_speakers key) are
replaced with from_agent so it matches the AgentInteraction definition.
🧹 Nitpick comments (3)
src/agentunit/dashboard/app.py (2)

8-9: Consider guarding pandas and plotly imports like streamlit.

The streamlit import is guarded with a try/except (lines 12-18) to provide a helpful error message if missing. However, pandas and plotly.express are imported unconditionally and will raise ImportError without context if not installed.

♻️ Suggested approach
-import pandas as pd
-import plotly.express as px
+try:
+    import pandas as pd
+    import plotly.express as px
+    HAS_PLOTLY = True
+except ImportError:
+    HAS_PLOTLY = False
+    pd = None
+    px = None

Then check HAS_PLOTLY before rendering charts in _render_reports, or include these in the existing streamlit check.


396-402: Pie chart may not be appropriate for these metrics.

The pie chart shows "Metric Contribution" for Success Rate (%), Avg Latency (s), and Total Cost ($). These metrics have different units and scales with no meaningful relationship as proportions of a whole. A pie chart implies parts summing to 100%, but 95% + 0.5s + $0.01 has no semantic meaning.

Consider either:

  1. Removing the pie chart, or
  2. Using it for a different dataset (e.g., pass/fail distribution, cost breakdown by category)
src/agentunit/adapters/autogen_ag2.py (1)

343-354: Consider filtering response times to actual replies.

The current logic calculates time differences between all consecutive messages, regardless of speaker. In multi-agent scenarios, this may include delays between messages from the same agent (follow-ups) rather than actual response times.

For more accurate metrics, consider only measuring time between messages where from_agent differs:

♻️ Suggested refinement
         response_times = []

         # Go through messages one by one
         for i in range(1, len(session_interactions)):
+            # Only measure response time when speaker changes
+            if session_interactions[i].from_agent == session_interactions[i - 1].from_agent:
+                continue
             previous_message_time = session_interactions[i - 1].timestamp
             current_message_time = session_interactions[i].timestamp

             time_difference = (current_message_time - previous_message_time).total_seconds()

             response_times.append(time_difference)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48cfb0d and 8b889c7.

📒 Files selected for processing (3)
  • src/agentunit/adapters/autogen_ag2.py
  • src/agentunit/dashboard/app.py
  • src/agentunit/datasets/base.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/agentunit/datasets/base.py (1)
src/agentunit/core/exceptions.py (1)
  • AgentUnitError (8-11)
src/agentunit/dashboard/app.py (1)
src/agentunit/comparison/statistics.py (1)
  • max (54-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test (Python 3.10)
  • GitHub Check: Test (Python 3.12)
🔇 Additional comments (3)
src/agentunit/datasets/base.py (2)

82-93: LGTM! Clean helper function for parsing delimited fields.

The function handles edge cases well: empty/None values, empty delimiters, and whitespace trimming. The fallback behavior when delimiter is empty (returning the trimmed value as a single-item list) is a sensible defensive choice.


96-133: Good improvement to CSV parsing robustness.

The configurable delimiters add flexibility, and the explicit metadata key exclusion is cleaner than the previous approach. The error handling with row context will make debugging CSV issues much easier.

One minor observation: the broad except Exception at line 131 will catch both malformed data and missing required fields (like query). This is acceptable since the error message includes the row content, but consider whether a more specific message for KeyError (missing required column) would be helpful.

src/agentunit/adapters/autogen_ag2.py (1)

321-321: Good fix: Duration is now calculated from actual timestamps.

The change from a hardcoded 0.0 to computing the actual duration using (end_time - start_time).total_seconds() * 1000 provides meaningful metrics for session analysis.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 356 to 365
# Basic metrics
metrics = {
"total_messages": len(session_interactions),
"unique_speakers": len({i.sender_id for i in session_interactions}),
"duration_seconds": (
session.get("end_time", datetime.now()) - session["start_time"]
).total_seconds(),
"average_response_time": 0.0, # Would need timing data
"average_response_time": average_response_time,
"conversation_turns": session.get("message_count", 0),
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd src/agentunit/adapters && wc -l autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 84


🏁 Script executed:

# First, let's find the AgentInteraction class definition
grep -n "class AgentInteraction" src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 48


🏁 Script executed:

# Read the relevant sections: AgentInteraction class definition and the usage
sed -n '1,100p' src/agentunit/adapters/autogen_ag2.py | head -80

Repository: aviralgarg05/agentunit

Length of output: 2349


🏁 Script executed:

# Check the AgentInteraction class definition more broadly
rg -n "class AgentInteraction" -A 20 src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 48


🏁 Script executed:

find . -name "multiagent.py" -o -name "multiagent" -type d

Repository: aviralgarg05/agentunit

Length of output: 92


🏁 Script executed:

# Look for files in the multiagent directory
fd -type f ".*\.py" | grep -i multiagent | head -20

Repository: aviralgarg05/agentunit

Length of output: 236


🏁 Script executed:

# Check the structure
git ls-files | grep -i multiagent

Repository: aviralgarg05/agentunit

Length of output: 180


🏁 Script executed:

# Read the exact lines from the file to see the actual code
sed -n '356,365p' src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 529


🏁 Script executed:

# Also check lines 265-274 for AgentInteraction creation
sed -n '265,274p' src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 475


🏁 Script executed:

# Check lines 343-354 for response_times calculation mentioned in scratchpad
sed -n '343,354p' src/agentunit/adapters/autogen_ag2.py

Repository: aviralgarg05/agentunit

Length of output: 580


🏁 Script executed:

# Check the AgentInteraction class in the multiagent module
cat -n src/agentunit/multiagent/__init__.py | head -100

Repository: aviralgarg05/agentunit

Length of output: 4400


🏁 Script executed:

# Search for AgentInteraction class definition
rg -n "class AgentInteraction" -A 30 src/agentunit/multiagent/

Repository: aviralgarg05/agentunit

Length of output: 2434


Use correct attribute name: from_agent instead of sender_id.

Line 359 accesses i.sender_id, but AgentInteraction objects have a from_agent attribute (defined in src/agentunit/multiagent/__init__.py:179). This will cause an AttributeError at runtime when calculating unique_speakers.

🐛 Proposed fix
         metrics = {
             "total_messages": len(session_interactions),
-            "unique_speakers": len({i.sender_id for i in session_interactions}),
+            "unique_speakers": len({i.from_agent for i in session_interactions}),
             "duration_seconds": (
                 session.get("end_time", datetime.now()) - session["start_time"]
             ).total_seconds(),
             "average_response_time": average_response_time,
             "conversation_turns": session.get("message_count", 0),
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Basic metrics
metrics = {
"total_messages": len(session_interactions),
"unique_speakers": len({i.sender_id for i in session_interactions}),
"duration_seconds": (
session.get("end_time", datetime.now()) - session["start_time"]
).total_seconds(),
"average_response_time": 0.0, # Would need timing data
"average_response_time": average_response_time,
"conversation_turns": session.get("message_count", 0),
}
# Basic metrics
metrics = {
"total_messages": len(session_interactions),
"unique_speakers": len({i.from_agent for i in session_interactions}),
"duration_seconds": (
session.get("end_time", datetime.now()) - session["start_time"]
).total_seconds(),
"average_response_time": average_response_time,
"conversation_turns": session.get("message_count", 0),
}
🤖 Prompt for AI Agents
In `@src/agentunit/adapters/autogen_ag2.py` around lines 356 - 365, The
unique_speakers calculation currently uses the nonexistent attribute sender_id
on items in session_interactions; update the set comprehension to use the
AgentInteraction attribute from_agent instead (i.e., compute len({i.from_agent
for i in session_interactions})). Ensure any references to sender_id in the
metrics block (e.g., the unique_speakers key) are replaced with from_agent so it
matches the AgentInteraction definition.

Comment on lines +379 to +404
df = pd.DataFrame(
{
"Metric": list(report_metrics.keys()),
"Value": list(report_metrics.values()),
}
)

# Bar chart
fig_bar = px.bar(
df,
x="Metric",
y="Value",
title="Metric Breakdown",
text="Value",
)
st.plotly_chart(fig_bar, use_container_width=True)
# Pie chart
fig_pie = px.pie(
df,
names="Metric",
values="Value",
title="Metric Contribution",
)
st.plotly_chart(fig_pie, use_container_width=True)
# Table view
st.dataframe(df)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Indentation error causes code to run outside the if selected_run: block.

The DataFrame creation and chart rendering (lines 379-404) are dedented and will execute even when selected_run is falsy. This will cause a NameError because report_metrics is only defined inside the if selected_run: block (lines 371-377).

🐛 Proposed fix - indent the visualization code
                 report_metrics = {
                     "Success Rate (%)": (
                         run_data.get("passed", 0) / max(run_data.get("total", 1), 1) * 100
                     ),
                     "Avg Latency (s)": run_data.get("avg_latency", 0),
                     "Total Cost ($)": run_data.get("cost", 0),
                 }

-            df = pd.DataFrame(
-                {
-                    "Metric": list(report_metrics.keys()),
-                    "Value": list(report_metrics.values()),
-                }
-            )
-
-            # Bar chart
-            fig_bar = px.bar(
-                df,
-                x="Metric",
-                y="Value",
-                title="Metric Breakdown",
-                text="Value",
-            )
-            st.plotly_chart(fig_bar, use_container_width=True)
-            # Pie chart
-            fig_pie = px.pie(
-                df,
-                names="Metric",
-                values="Value",
-                title="Metric Contribution",
-            )
-            st.plotly_chart(fig_pie, use_container_width=True)
-            # Table view
-            st.dataframe(df)
+                df = pd.DataFrame(
+                    {
+                        "Metric": list(report_metrics.keys()),
+                        "Value": list(report_metrics.values()),
+                    }
+                )
+
+                # Bar chart
+                fig_bar = px.bar(
+                    df,
+                    x="Metric",
+                    y="Value",
+                    title="Metric Breakdown",
+                    text="Value",
+                )
+                st.plotly_chart(fig_bar, use_container_width=True)
+                # Pie chart
+                fig_pie = px.pie(
+                    df,
+                    names="Metric",
+                    values="Value",
+                    title="Metric Contribution",
+                )
+                st.plotly_chart(fig_pie, use_container_width=True)
+                # Table view
+                st.dataframe(df)

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 6.89655% with 27 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/agentunit/datasets/base.py 0.00% 13 Missing ⚠️
src/agentunit/adapters/autogen_ag2.py 0.00% 7 Missing ⚠️
src/agentunit/dashboard/app.py 22.22% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Owner

@aviralgarg05 aviralgarg05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aviralgarg05 aviralgarg05 merged commit 8f9a7df into aviralgarg05:main Jan 14, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Visualizations to Dashboard Reports

3 participants