-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Exploratory issue to evaluate different chart types for visualizing how posterior beliefs evolve over time during experiments. The goal is to prototype multiple approaches so stakeholders can judge their usefulness firsthand.
Context
With the event log feature (#16), we'll have historical snapshot data available:
turnsandrewardsper arm at various trial pointstotal_experiment_turnsfor x-axis positioningcreatedtimestamp for optional date-based display
From this data we can compute:
- Posterior mean (estimated conversion rate)
- Credible intervals (uncertainty bounds)
- Full Beta distribution shape
- Probability each arm is "best"
Experiment Types to Consider
The module serves two distinct use cases with different visualization needs:
| Type | Arms | Example | Visualization challenge |
|---|---|---|---|
| A/B tests | 2-5 | Landing page variants | Show detail per arm |
| Recommendations | 100-500+ | Blog post rankings | Summarize many arms |
Charts should work for both, or we need different defaults per type.
Chart Types to Prototype
1. Line Chart with Confidence Bands
Best for: A/B tests (2-5 arms)
Rate
│ ╭──────────────────── Arm A (shaded CI)
0.15─│───╱─────────────────────────────
│ ╱ ╭─────────────── Arm B
0.10─│─╱───╱────────────────────────────
│╱ ╱
0.05─│───╱──────────────────────────────
└────────────────────────────────→ trials
- X-axis: total experiment turns (or datetime)
- Y-axis: conversion rate estimate
- Lines: one per arm with distinct colors
- Bands: 95% credible interval shaded around each line
Pros: Intuitive, shows estimate + uncertainty, familiar format
Cons: Gets cluttered with many arms, overlapping bands hard to read
Prototype tasks:
- Basic line chart with Chart.js or similar
- Add shaded confidence bands
- Test with 2, 5, and 10 arms
- Toggle between trials and datetime x-axis
2. Probability of Winning (Stacked Area)
Best for: A/B tests, decision-focused view
P(best)
1.0─│████████████████▓▓▓▓▓▓▓▓▓▓░░░░░░
│████████████████▓▓▓▓▓▓▓▓▓▓░░░░░░
0.5─│████ Arm A █████▓▓ Arm B ▓░░░░░░
│████████████████▓▓▓▓▓▓▓▓▓▓░ C ░░
0.0─│████████████████▓▓▓▓▓▓▓▓▓▓░░░░░░
└────────────────────────────────→ trials
- X-axis: trials or datetime
- Y-axis: probability (0-1, stacked to 100%)
- Each arm is a colored band
Pros: Answers "which should I pick?", always sums to 100%, intuitive competition view
Cons: Doesn't show actual conversion rates, requires Monte Carlo computation
Prototype tasks:
- Stacked area chart implementation
- P(best) calculation from Beta distributions
- Test with 2, 5, and 10 arms
- Consider animation showing bands shifting
3. Heatmap (Arms × Time)
Best for: Large recommendation experiments (100+ arms)
Trials →
10 100 1000 10000
Arm 1 ░░ ▒▒ ▓▓ ██
Arm 2 ░░ ▒▒ ▓▓ ██
Arm 3 ░░ ░░ ▒▒ ▓▓
...
Arm 99 ░░ ░░ ░░ ▒▒
Color = conversion rate (darker = higher)
- X-axis: trial progression
- Y-axis: arms (sortable by current performance)
- Color intensity: conversion rate or P(best)
Pros: Scales to hundreds of arms, shows patterns across whole experiment
Cons: Less precise than line charts, requires good color scale design
Prototype tasks:
- Heatmap grid implementation
- Sortable rows (by name, current rate, total trials)
- Color scale selection (sequential vs diverging)
- Test with 50, 100, 500 arms
4. Ranking Chart (Bump Chart)
Best for: Recommendations, seeing position changes
Rank
1─│ ╲ ╱───────── post1
2─│─────╲╱─────╲────── post2
3─│───────────╱─╲───── post3
4─│──────────────╲──── post4
└─────────────────────→ trials
- X-axis: trials or datetime
- Y-axis: rank position
- Lines: one per arm showing rank over time
Pros: Clear view of competition, works for many arms (show top N)
Cons: Doesn't show magnitude of differences, can get tangled
Prototype tasks:
- Bump chart implementation
- Show top N arms (configurable, default 10-20)
- Highlight lines on hover
- Click to see arm details
5. Distribution Evolution (Ridge/Joy Plot)
Best for: Educational view, single arm deep-dive
Trial 1000 ───────╱╲───────────────
Trial 500 ────╱──╲────────────────
Trial 100 ──╱────╲────────────────
Trial 10 ╱───────╲───────────────
0.0 0.1 0.2 Rate
- X-axis: conversion rate
- Y-axis: stacked by trial number
- Shape: actual Beta distribution PDF
Pros: Beautiful, shows full uncertainty evolution, educational
Cons: Only practical for 1-3 arms, complex to read
Prototype tasks:
- Ridge plot implementation
- Beta PDF calculation and rendering
- Animation option (morphing distribution)
- Use as detail view when clicking an arm
6. Convergence Indicator
Best for: Quick status check, dashboard widget
CI Width
│╲
0.3─│ ╲
│ ╲____
0.1─│ ╲________
└────────────────→ trials
[███████████░░░░] 78% confident
- Simple line showing uncertainty shrinking over time
- Or: progress bar showing "confidence level"
Pros: At-a-glance experiment maturity, answers "can we decide yet?"
Cons: Supplementary only, doesn't show which arm is winning
Prototype tasks:
- CI width over time line chart
- Confidence progress bar widget
- Threshold indicator (e.g., "95% confident A beats B")
Recommended Approach
Phase 1: Core Charts
- Line chart with CI bands (primary for A/B tests)
- Heatmap (primary for large experiments)
Phase 2: Decision Support
- P(best) stacked area
- Convergence indicator
Phase 3: Advanced
- Ranking chart
- Ridge plot for deep-dive
Technical Considerations
- Library: Chart.js, D3.js, or Apache ECharts
- Rendering: Client-side JavaScript, data via JSON endpoint
- Responsive: Charts should work on mobile
- Accessibility: Color-blind friendly palettes, screen reader support
- Performance: Lazy load charts, paginate large experiments
Deliverables
- Prototype each chart type with sample data
- Screenshot/demo of each for stakeholder review
- Recommendation for default chart per experiment type
- Performance testing with large datasets
Dependencies
- Add event log for historical posterior visualization #16 (Event log for historical data)