-
Notifications
You must be signed in to change notification settings - Fork 13
Add Plotly charts to Reports dashboard #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Plotly charts to Reports dashboard #66
Conversation
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Signed-off-by: Jagriti-student <jagriti7989@gmail.com>
Learn moreAll Green is an AI agent that automatically: ✅ Addresses code review comments ✅ Fixes failing CI checks ✅ Resolves merge conflicts |
WalkthroughSession metrics now compute real durations and average response times from timestamps; the dashboard adds Plotly-powered charts and a summary table for reports; CSV loader gains configurable delimiters for tools/context, a parsing helper, and stronger row-level error reporting. Changes
Sequence Diagram(s)(omitted — changes do not introduce a multi-component sequential flow requiring visualization) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
✏️ Tip: You can disable this entire section by setting Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/agentunit/adapters/autogen_ag2.py (2)
391-404: Same attribute mismatch in_calculate_participation_balance.Line 397 uses
interaction.sender_id, which should beinteraction.from_agentto match theAgentInteractionclass definition.🐛 Proposed fix
def _calculate_participation_balance( self, interactions: list[AgentInteraction] ) -> dict[str, float]: """Calculate how balanced participation is across agents.""" speaker_counts = {} for interaction in interactions: - speaker_id = interaction.sender_id + speaker_id = interaction.from_agent speaker_counts[speaker_id] = speaker_counts.get(speaker_id, 0) + 1
406-420: Same attribute mismatch in_analyze_interaction_patterns.Line 412 uses
i.sender_id, which should bei.from_agent.🐛 Proposed fix
# Simple pattern analysis if len(interactions) > 1: - speakers = [i.sender_id for i in interactions] + speakers = [i.from_agent for i in interactions]
🤖 Fix all issues with AI agents
In `@src/agentunit/adapters/autogen_ag2.py`:
- Around line 356-365: The unique_speakers calculation currently uses the
nonexistent attribute sender_id on items in session_interactions; update the set
comprehension to use the AgentInteraction attribute from_agent instead (i.e.,
compute len({i.from_agent for i in session_interactions})). Ensure any
references to sender_id in the metrics block (e.g., the unique_speakers key) are
replaced with from_agent so it matches the AgentInteraction definition.
🧹 Nitpick comments (3)
src/agentunit/dashboard/app.py (2)
8-9: Consider guarding pandas and plotly imports like streamlit.The
streamlitimport is guarded with a try/except (lines 12-18) to provide a helpful error message if missing. However,pandasandplotly.expressare imported unconditionally and will raiseImportErrorwithout context if not installed.♻️ Suggested approach
-import pandas as pd -import plotly.express as px +try: + import pandas as pd + import plotly.express as px + HAS_PLOTLY = True +except ImportError: + HAS_PLOTLY = False + pd = None + px = NoneThen check
HAS_PLOTLYbefore rendering charts in_render_reports, or include these in the existing streamlit check.
396-402: Pie chart may not be appropriate for these metrics.The pie chart shows "Metric Contribution" for Success Rate (%), Avg Latency (s), and Total Cost ($). These metrics have different units and scales with no meaningful relationship as proportions of a whole. A pie chart implies parts summing to 100%, but 95% + 0.5s + $0.01 has no semantic meaning.
Consider either:
- Removing the pie chart, or
- Using it for a different dataset (e.g., pass/fail distribution, cost breakdown by category)
src/agentunit/adapters/autogen_ag2.py (1)
343-354: Consider filtering response times to actual replies.The current logic calculates time differences between all consecutive messages, regardless of speaker. In multi-agent scenarios, this may include delays between messages from the same agent (follow-ups) rather than actual response times.
For more accurate metrics, consider only measuring time between messages where
from_agentdiffers:♻️ Suggested refinement
response_times = [] # Go through messages one by one for i in range(1, len(session_interactions)): + # Only measure response time when speaker changes + if session_interactions[i].from_agent == session_interactions[i - 1].from_agent: + continue previous_message_time = session_interactions[i - 1].timestamp current_message_time = session_interactions[i].timestamp time_difference = (current_message_time - previous_message_time).total_seconds() response_times.append(time_difference)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/agentunit/adapters/autogen_ag2.pysrc/agentunit/dashboard/app.pysrc/agentunit/datasets/base.py
🧰 Additional context used
🧬 Code graph analysis (2)
src/agentunit/datasets/base.py (1)
src/agentunit/core/exceptions.py (1)
AgentUnitError(8-11)
src/agentunit/dashboard/app.py (1)
src/agentunit/comparison/statistics.py (1)
max(54-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test (Python 3.10)
- GitHub Check: Test (Python 3.12)
🔇 Additional comments (3)
src/agentunit/datasets/base.py (2)
82-93: LGTM! Clean helper function for parsing delimited fields.The function handles edge cases well: empty/None values, empty delimiters, and whitespace trimming. The fallback behavior when delimiter is empty (returning the trimmed value as a single-item list) is a sensible defensive choice.
96-133: Good improvement to CSV parsing robustness.The configurable delimiters add flexibility, and the explicit metadata key exclusion is cleaner than the previous approach. The error handling with row context will make debugging CSV issues much easier.
One minor observation: the broad
except Exceptionat line 131 will catch both malformed data and missing required fields (likequery). This is acceptable since the error message includes the row content, but consider whether a more specific message forKeyError(missing required column) would be helpful.src/agentunit/adapters/autogen_ag2.py (1)
321-321: Good fix: Duration is now calculated from actual timestamps.The change from a hardcoded
0.0to computing the actual duration using(end_time - start_time).total_seconds() * 1000provides meaningful metrics for session analysis.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| # Basic metrics | ||
| metrics = { | ||
| "total_messages": len(session_interactions), | ||
| "unique_speakers": len({i.sender_id for i in session_interactions}), | ||
| "duration_seconds": ( | ||
| session.get("end_time", datetime.now()) - session["start_time"] | ||
| ).total_seconds(), | ||
| "average_response_time": 0.0, # Would need timing data | ||
| "average_response_time": average_response_time, | ||
| "conversation_turns": session.get("message_count", 0), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd src/agentunit/adapters && wc -l autogen_ag2.pyRepository: aviralgarg05/agentunit
Length of output: 84
🏁 Script executed:
# First, let's find the AgentInteraction class definition
grep -n "class AgentInteraction" src/agentunit/adapters/autogen_ag2.pyRepository: aviralgarg05/agentunit
Length of output: 48
🏁 Script executed:
# Read the relevant sections: AgentInteraction class definition and the usage
sed -n '1,100p' src/agentunit/adapters/autogen_ag2.py | head -80Repository: aviralgarg05/agentunit
Length of output: 2349
🏁 Script executed:
# Check the AgentInteraction class definition more broadly
rg -n "class AgentInteraction" -A 20 src/agentunit/adapters/autogen_ag2.pyRepository: aviralgarg05/agentunit
Length of output: 48
🏁 Script executed:
find . -name "multiagent.py" -o -name "multiagent" -type dRepository: aviralgarg05/agentunit
Length of output: 92
🏁 Script executed:
# Look for files in the multiagent directory
fd -type f ".*\.py" | grep -i multiagent | head -20Repository: aviralgarg05/agentunit
Length of output: 236
🏁 Script executed:
# Check the structure
git ls-files | grep -i multiagentRepository: aviralgarg05/agentunit
Length of output: 180
🏁 Script executed:
# Read the exact lines from the file to see the actual code
sed -n '356,365p' src/agentunit/adapters/autogen_ag2.pyRepository: aviralgarg05/agentunit
Length of output: 529
🏁 Script executed:
# Also check lines 265-274 for AgentInteraction creation
sed -n '265,274p' src/agentunit/adapters/autogen_ag2.pyRepository: aviralgarg05/agentunit
Length of output: 475
🏁 Script executed:
# Check lines 343-354 for response_times calculation mentioned in scratchpad
sed -n '343,354p' src/agentunit/adapters/autogen_ag2.pyRepository: aviralgarg05/agentunit
Length of output: 580
🏁 Script executed:
# Check the AgentInteraction class in the multiagent module
cat -n src/agentunit/multiagent/__init__.py | head -100Repository: aviralgarg05/agentunit
Length of output: 4400
🏁 Script executed:
# Search for AgentInteraction class definition
rg -n "class AgentInteraction" -A 30 src/agentunit/multiagent/Repository: aviralgarg05/agentunit
Length of output: 2434
Use correct attribute name: from_agent instead of sender_id.
Line 359 accesses i.sender_id, but AgentInteraction objects have a from_agent attribute (defined in src/agentunit/multiagent/__init__.py:179). This will cause an AttributeError at runtime when calculating unique_speakers.
🐛 Proposed fix
metrics = {
"total_messages": len(session_interactions),
- "unique_speakers": len({i.sender_id for i in session_interactions}),
+ "unique_speakers": len({i.from_agent for i in session_interactions}),
"duration_seconds": (
session.get("end_time", datetime.now()) - session["start_time"]
).total_seconds(),
"average_response_time": average_response_time,
"conversation_turns": session.get("message_count", 0),
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Basic metrics | |
| metrics = { | |
| "total_messages": len(session_interactions), | |
| "unique_speakers": len({i.sender_id for i in session_interactions}), | |
| "duration_seconds": ( | |
| session.get("end_time", datetime.now()) - session["start_time"] | |
| ).total_seconds(), | |
| "average_response_time": 0.0, # Would need timing data | |
| "average_response_time": average_response_time, | |
| "conversation_turns": session.get("message_count", 0), | |
| } | |
| # Basic metrics | |
| metrics = { | |
| "total_messages": len(session_interactions), | |
| "unique_speakers": len({i.from_agent for i in session_interactions}), | |
| "duration_seconds": ( | |
| session.get("end_time", datetime.now()) - session["start_time"] | |
| ).total_seconds(), | |
| "average_response_time": average_response_time, | |
| "conversation_turns": session.get("message_count", 0), | |
| } |
🤖 Prompt for AI Agents
In `@src/agentunit/adapters/autogen_ag2.py` around lines 356 - 365, The
unique_speakers calculation currently uses the nonexistent attribute sender_id
on items in session_interactions; update the set comprehension to use the
AgentInteraction attribute from_agent instead (i.e., compute len({i.from_agent
for i in session_interactions})). Ensure any references to sender_id in the
metrics block (e.g., the unique_speakers key) are replaced with from_agent so it
matches the AgentInteraction definition.
| df = pd.DataFrame( | ||
| { | ||
| "Metric": list(report_metrics.keys()), | ||
| "Value": list(report_metrics.values()), | ||
| } | ||
| ) | ||
|
|
||
| # Bar chart | ||
| fig_bar = px.bar( | ||
| df, | ||
| x="Metric", | ||
| y="Value", | ||
| title="Metric Breakdown", | ||
| text="Value", | ||
| ) | ||
| st.plotly_chart(fig_bar, use_container_width=True) | ||
| # Pie chart | ||
| fig_pie = px.pie( | ||
| df, | ||
| names="Metric", | ||
| values="Value", | ||
| title="Metric Contribution", | ||
| ) | ||
| st.plotly_chart(fig_pie, use_container_width=True) | ||
| # Table view | ||
| st.dataframe(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Indentation error causes code to run outside the if selected_run: block.
The DataFrame creation and chart rendering (lines 379-404) are dedented and will execute even when selected_run is falsy. This will cause a NameError because report_metrics is only defined inside the if selected_run: block (lines 371-377).
🐛 Proposed fix - indent the visualization code
report_metrics = {
"Success Rate (%)": (
run_data.get("passed", 0) / max(run_data.get("total", 1), 1) * 100
),
"Avg Latency (s)": run_data.get("avg_latency", 0),
"Total Cost ($)": run_data.get("cost", 0),
}
- df = pd.DataFrame(
- {
- "Metric": list(report_metrics.keys()),
- "Value": list(report_metrics.values()),
- }
- )
-
- # Bar chart
- fig_bar = px.bar(
- df,
- x="Metric",
- y="Value",
- title="Metric Breakdown",
- text="Value",
- )
- st.plotly_chart(fig_bar, use_container_width=True)
- # Pie chart
- fig_pie = px.pie(
- df,
- names="Metric",
- values="Value",
- title="Metric Contribution",
- )
- st.plotly_chart(fig_pie, use_container_width=True)
- # Table view
- st.dataframe(df)
+ df = pd.DataFrame(
+ {
+ "Metric": list(report_metrics.keys()),
+ "Value": list(report_metrics.values()),
+ }
+ )
+
+ # Bar chart
+ fig_bar = px.bar(
+ df,
+ x="Metric",
+ y="Value",
+ title="Metric Breakdown",
+ text="Value",
+ )
+ st.plotly_chart(fig_bar, use_container_width=True)
+ # Pie chart
+ fig_pie = px.pie(
+ df,
+ names="Metric",
+ values="Value",
+ title="Metric Contribution",
+ )
+ st.plotly_chart(fig_pie, use_container_width=True)
+ # Table view
+ st.dataframe(df)|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
aviralgarg05
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!

Summary
Added interactive Plotly visualizations to the Reports section.
Changes
Closes #57
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.