Skip to content

Conversation

@bryancall
Copy link
Contributor

Summary

This PR modernizes traffic_top with a complete UI redesign featuring:

  • Responsive multi-column layouts that adapt to terminal size (80x24, 120x40, 160x40)
  • 9 dedicated pages: Overview, Responses, Connections, Cache, SSL/TLS, Errors, Performance, Graphs, Help
  • Real-time ASCII graphs showing request rates, bandwidth, cache hit rates, and connections
  • HTTP milestone timing on the Performance page showing request lifecycle timing
  • Color-coded statistics with btop-style visual design
  • Unicode box drawing with ASCII fallback for compatibility
  • Batch mode (-b/-B) for scripted monitoring with JSON output option

Code Quality Improvements

  • Replaced goto statements with clean loop control
  • Added explicit SIGWINCH handler for responsive terminal resize
  • Replaced ink_assert with graceful error handling
  • Defined magic numbers as named constants
  • Added stat lookup table validation at startup (debug builds)
  • Moved <chrono> include from header to implementation
  • Added comprehensive code documentation
  • Fixed duplicate stats appearing on the same page in larger layouts
  • Updated help page with all keyboard shortcuts

Test plan

  • Build with ASAN: ninja traffic_top
  • Run autest: autest -f traffic_top_batch passes
  • Manual testing of all pages at different terminal sizes (80x24, 120x40, 160x40)
  • Verify batch mode output: traffic_top -b -c 1
  • Test ASCII mode toggle with 'a' key
  • Test window resize handling

Major refactoring of traffic_top to provide a modern, responsive terminal UI:

- Add responsive layouts for 80x24, 120x40, and 160x40 terminals
- Use Unicode box-drawing characters with rounded corners (ASCII fallback)
- Add btop-style graph page with Unicode block characters (▁▂▃▄▅▆▇█)
- Color gradient for graphs: blue→cyan→green→yellow→red
- Add history tracking for key metrics (last 120 samples)
- Reorganize stats into logical groups: Cache, Client, Origin, etc.
- Add batch mode (-b) with JSON output support (-j)
- Improve color coding for values by magnitude (K/M/G/T suffixes)

New files:
- Display.cc/h: ANSI terminal rendering with boxes, stats, graphs
- Stats.cc/h: Metric collection with history tracking
- Output.cc/h: Batch mode text/JSON output
- StatType.h: Stat type definitions
- LAYOUT.md: Documentation of exact terminal layouts
- format_layout.py, format_graphs.py: Layout generation tools

Press 'g' or '7' for the new graphs page.
Tests for the new traffic_top batch mode functionality:
- JSON output format validation
- Required JSON fields (timestamp, host)
- Text output format with headers
- Multiple iterations output
- Help and version flags
…ching

- Add Performance page (7/p) showing HTTP milestones in chronological order
- Add RateNsToMs stat type to convert nanosecond milestones to milliseconds
- Fix transaction time calculations (values already in ms, don't multiply)
- Add consistent color scheme across all layouts (CLIENT=cyan, ORIGIN=blue, etc.)
- Pass border colors to drawStatPairRow for consistent vertical borders
- Add bright border colors (Border4-7: bright blue, yellow, red, green)
- Improve status bar with blue background and colored mode indicator
- Auto-switch from absolute to rate mode once rates can be calculated
- Show [ABS]/[RATE] indicator in status bar with colored badges
- Display values immediately on startup using absolute mode
- Use shorter initial timeout (100ms) for responsive first display
- Update help page with all 8 pages including Performance
- Update key hints to show 1-8 page range
The ssl_handshake_fail stat was incorrectly mapped to
total_handshake_time (a timing metric) instead of an actual
error metric. Changed to use ssl_error_ssl which counts SSL errors.

Also fixed the traffic_top_batch test to properly discover the
traffic_top binary from build directories instead of falling
back to the installed /opt/ats version.
Instead of crashing with ink_assert when an unknown stat key is
requested, return default values and log an error message. This
makes traffic_top more robust and prevents crashes when stat
definitions don't match.

Removed the now-unused ink_assert.h include.
Replace the goto statement with a boolean running flag for cleaner
control flow. This is a more idiomatic C++ pattern and eliminates
the goto/label construct.
Add a SIGWINCH signal handler to properly handle terminal window
resize events. When the terminal is resized, the handler sets a
flag that triggers ncurses to refresh its internal state, ensuring
the display updates correctly to the new dimensions.

The display already polls terminal size on each render, but this
explicit handler ensures ncurses is properly notified of the resize.
Define named constants for timeout values and retry counts to
improve code readability and maintainability:

- FIRST_DISPLAY_TIMEOUT_MS: Initial display timeout (1000ms)
- CONNECT_RETRY_TIMEOUT_MS: Connection retry interval (500ms)
- MAX_CONNECTION_RETRIES: Maximum retry attempts (10)
- MS_PER_SECOND: Milliseconds per second conversion factor
- RPC_TIMEOUT_MS: RPC call timeout (1000ms)
- RPC_RETRY_COUNT: RPC retry count (10)
Add validateLookupTable() method to check that derived stats
(Ratio, Percentage, Sum, SumBits, SumAbsolute, TimeRatio) have
valid references to their numerator and denominator keys.

The validation runs in debug builds and prints warnings for any
misconfigured stats, making it easier to catch configuration
errors during development.
Add missing keyboard shortcuts to the help page:
- Left/m for previous page
- Right/r for next page
- b/ESC to go back from help

Also clarified existing shortcuts and improved formatting.
The <chrono> header was included in Stats.h but only used in
Stats.cc. Move the include to where it's actually needed to
reduce unnecessary header dependencies.
Add detailed comments throughout the traffic_top codebase:

- File header: Added overview of features and capabilities
- Signal handlers: Documented purpose and usage of each signal
- run_interactive(): Detailed explanation of the main event loop,
  timeout strategy, display modes, and state variables
- run_batch(): Documented output formats and batch mode behavior
- Keyboard handling: Added reference comment block listing all keys
- main(): Added usage examples and argument format documentation
- Stats.h: Documented private members and their purposes
- Display.h: Documented page renderers, layout functions, and
  the responsive layout strategy

This improves code maintainability and helps new contributors
understand the codebase structure.
Fixed several issues with duplicate statistics appearing on the same page:

120x40 layout:
- HIT RATES: Replaced 'abort' (error stat) with 'ram_hit', 'ram_miss'
- BANDWIDTH: Replaced 'ka_total' with 'total_time'
- Row 4: Completely redesigned to show HTTP METHODS, RESPONSE TIMES,
  and HTTP CODES instead of duplicating stats from rows 1-3

160x40 layout:
- REQUESTS: Fixed row 6 to show 5xx and client_req_conn instead of
  duplicating client_req/server_req
- HIT RATES: Fixed to show ram_hit/ram_miss instead of entries
- RESPONSES: Changed to show specific HTTP codes (200-502) instead
  of duplicating aggregate stats (2xx-5xx)
- BANDWIDTH: Replaced fresh_time with total_time
- TOTALS: Completely redesigned to show request/response summary
- CACHE DETAIL: Fixed cache_writes appearing 5 times (copy-paste error)
- ORIGIN DETAIL: Changed to show SSL origin stats
- MISC STATS: Changed to show transaction errors and timing stats
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR comprehensively modernizes the traffic_top tool with a complete redesign featuring responsive multi-column layouts, real-time ASCII graphs, multiple dedicated pages for different metrics, and batch mode with JSON output. The changes represent a major architectural improvement from the original simple stats display to a feature-rich monitoring tool.

Key Changes

  • Replaced goto statements with structured control flow and signal handlers
  • Implemented responsive layouts adapting to terminal sizes (80x24, 120x40, 160x40+)
  • Added 9 dedicated pages for different stat categories with btop-style visualization
  • Introduced batch mode with JSON/text output for scripted monitoring

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/gold_tests/traffic_top/traffic_top_batch.test.py New test file for batch mode with JSON/text output validation
src/traffic_top/traffic_top.cc Refactored main with signal handlers, interactive/batch mode split
src/traffic_top/Stats.h New header with Stats class declaration and StatType enum
src/traffic_top/Stats.cc Stats implementation with RPC communication and metric calculations
src/traffic_top/StatType.h Enum defining stat types for calculation/display
src/traffic_top/Display.h Display class for curses-based TUI rendering
src/traffic_top/Display.cc Display implementation with responsive layouts and graphs
src/traffic_top/Output.h Output formatters for batch mode
src/traffic_top/Output.cc Text and JSON output implementations
src/traffic_top/README Updated documentation with usage examples
src/traffic_top/LAYOUT.md Detailed layout specifications for different terminal sizes
src/traffic_top/format_layout.py Python helper for generating layout documentation
src/traffic_top/format_graphs.py Python helper for graph formatting examples
src/traffic_top/CMakeLists.txt Updated build configuration for new source files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +97 to +100
# Test 4: Help output (argparse returns 64 for --help)
tr4 = helper.add_test("traffic_top help")
tr4.Processes.Default.Command = f"{traffic_top_path} --help"
tr4.Processes.Default.ReturnCode = 64
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return code check for --help appears incorrect. The comment states "argparse returns 64 for --help", but argparse typically returns 0 for --help by default. Return code 64 (EX_USAGE from sysexits.h) usually indicates a usage error. This test may fail if the help output is successful.

Suggested change
# Test 4: Help output (argparse returns 64 for --help)
tr4 = helper.add_test("traffic_top help")
tr4.Processes.Default.Command = f"{traffic_top_path} --help"
tr4.Processes.Default.ReturnCode = 64
# Test 4: Help output (argparse returns 0 for --help)
tr4 = helper.add_test("traffic_top help")
tr4.Processes.Default.Command = f"{traffic_top_path} --help"
tr4.Processes.Default.ReturnCode = 0

Copilot uses AI. Check for mistakes.
Comment on lines +620 to +630
if (key == "total_time") {
value = value / 10000000;
}

// Calculate rate if needed
if ((type == StatType::Rate || type == StatType::RequestPct || type == StatType::TimeRatio || type == StatType::RateNsToMs) &&
_old_stats != nullptr && !_absolute) {
double old = getValue(item.name, _old_stats.get());
if (key == "total_time") {
old = old / 10000000;
}
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special handling for "total_time" stat is duplicated and applies a hardcoded divisor of 10000000 without explanation. This divisor appears in both lines 621 and 629. If this is a unit conversion (e.g., nanoseconds to something else), it should be documented with a named constant. Additionally, the duplication suggests this logic should be refactored.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +235
if (_format == OutputFormat::Json) {
fprintf(_output, "{\"error\":\"%s\",\"timestamp\":\"%s\"}\n", message.c_str(), getCurrentTimestamp().c_str());
} else {
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON error output in printError escapes the error message but doesn't handle embedded quotes or newlines in the message string, which could produce invalid JSON. The message should be properly JSON-escaped to avoid breaking the output format for automated parsers.

Copilot uses AI. Check for mistakes.
Remove references to btop/btop++ from code comments and documentation.
Replace ncurses with standard POSIX terminal I/O:

- Use termios for raw terminal mode (no line buffering, no echo)
- Use select() with timeout for non-blocking keyboard input
- Parse ANSI escape sequences for arrow keys
- Remove ncurses include files and CURSES_LIBRARIES link

This eliminates the ncurses build dependency while maintaining
full functionality. The display output already used ANSI escape
codes, so ncurses was only needed for keyboard input handling.

Changes:
- Display::getInput() replaces ncurses getch()/timeout()
- Display stores/restores termios settings on init/shutdown
- CMakeLists.txt no longer links CURSES_LIBRARIES
- README updated to reflect POSIX terminal requirement
@bryancall bryancall self-assigned this Dec 15, 2025
@bryancall bryancall added this to the 10.2.0 milestone Dec 15, 2025
Add missing Apache 2.0 license headers to format_graphs.py and
format_layout.py to fix RAT (Release Audit Tool) check failure.
The setlocale() function and LC_ALL macro require <clocale> to be
explicitly included on FreeBSD. Other platforms may provide these
symbols through other headers, but FreeBSD requires the explicit
include.
Remove WIDTH_SMALL and HEIGHT_SMALL constants that were defined but
never used in the code. The layout breakpoint documentation is
preserved in the comments. This fixes the -Werror build failure on
Ubuntu where -Wunused-const-variable is enabled.
Use std::lround() instead of static_cast<int>() when formatting
stat values with SI suffixes. This ensures values like 1.9K display
as "2K" instead of "1K", providing more accurate information to users.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant