-
Notifications
You must be signed in to change notification settings - Fork 845
traffic_top: modernize UI with responsive layouts and performance monitoring #12758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
traffic_top: modernize UI with responsive layouts and performance monitoring #12758
Conversation
Major refactoring of traffic_top to provide a modern, responsive terminal UI: - Add responsive layouts for 80x24, 120x40, and 160x40 terminals - Use Unicode box-drawing characters with rounded corners (ASCII fallback) - Add btop-style graph page with Unicode block characters (▁▂▃▄▅▆▇█) - Color gradient for graphs: blue→cyan→green→yellow→red - Add history tracking for key metrics (last 120 samples) - Reorganize stats into logical groups: Cache, Client, Origin, etc. - Add batch mode (-b) with JSON output support (-j) - Improve color coding for values by magnitude (K/M/G/T suffixes) New files: - Display.cc/h: ANSI terminal rendering with boxes, stats, graphs - Stats.cc/h: Metric collection with history tracking - Output.cc/h: Batch mode text/JSON output - StatType.h: Stat type definitions - LAYOUT.md: Documentation of exact terminal layouts - format_layout.py, format_graphs.py: Layout generation tools Press 'g' or '7' for the new graphs page.
Tests for the new traffic_top batch mode functionality: - JSON output format validation - Required JSON fields (timestamp, host) - Text output format with headers - Multiple iterations output - Help and version flags
…ching - Add Performance page (7/p) showing HTTP milestones in chronological order - Add RateNsToMs stat type to convert nanosecond milestones to milliseconds - Fix transaction time calculations (values already in ms, don't multiply) - Add consistent color scheme across all layouts (CLIENT=cyan, ORIGIN=blue, etc.) - Pass border colors to drawStatPairRow for consistent vertical borders - Add bright border colors (Border4-7: bright blue, yellow, red, green) - Improve status bar with blue background and colored mode indicator - Auto-switch from absolute to rate mode once rates can be calculated - Show [ABS]/[RATE] indicator in status bar with colored badges - Display values immediately on startup using absolute mode - Use shorter initial timeout (100ms) for responsive first display - Update help page with all 8 pages including Performance - Update key hints to show 1-8 page range
The ssl_handshake_fail stat was incorrectly mapped to total_handshake_time (a timing metric) instead of an actual error metric. Changed to use ssl_error_ssl which counts SSL errors. Also fixed the traffic_top_batch test to properly discover the traffic_top binary from build directories instead of falling back to the installed /opt/ats version.
Instead of crashing with ink_assert when an unknown stat key is requested, return default values and log an error message. This makes traffic_top more robust and prevents crashes when stat definitions don't match. Removed the now-unused ink_assert.h include.
Replace the goto statement with a boolean running flag for cleaner control flow. This is a more idiomatic C++ pattern and eliminates the goto/label construct.
Add a SIGWINCH signal handler to properly handle terminal window resize events. When the terminal is resized, the handler sets a flag that triggers ncurses to refresh its internal state, ensuring the display updates correctly to the new dimensions. The display already polls terminal size on each render, but this explicit handler ensures ncurses is properly notified of the resize.
Define named constants for timeout values and retry counts to improve code readability and maintainability: - FIRST_DISPLAY_TIMEOUT_MS: Initial display timeout (1000ms) - CONNECT_RETRY_TIMEOUT_MS: Connection retry interval (500ms) - MAX_CONNECTION_RETRIES: Maximum retry attempts (10) - MS_PER_SECOND: Milliseconds per second conversion factor - RPC_TIMEOUT_MS: RPC call timeout (1000ms) - RPC_RETRY_COUNT: RPC retry count (10)
Add validateLookupTable() method to check that derived stats (Ratio, Percentage, Sum, SumBits, SumAbsolute, TimeRatio) have valid references to their numerator and denominator keys. The validation runs in debug builds and prints warnings for any misconfigured stats, making it easier to catch configuration errors during development.
Add missing keyboard shortcuts to the help page: - Left/m for previous page - Right/r for next page - b/ESC to go back from help Also clarified existing shortcuts and improved formatting.
The <chrono> header was included in Stats.h but only used in Stats.cc. Move the include to where it's actually needed to reduce unnecessary header dependencies.
Add detailed comments throughout the traffic_top codebase: - File header: Added overview of features and capabilities - Signal handlers: Documented purpose and usage of each signal - run_interactive(): Detailed explanation of the main event loop, timeout strategy, display modes, and state variables - run_batch(): Documented output formats and batch mode behavior - Keyboard handling: Added reference comment block listing all keys - main(): Added usage examples and argument format documentation - Stats.h: Documented private members and their purposes - Display.h: Documented page renderers, layout functions, and the responsive layout strategy This improves code maintainability and helps new contributors understand the codebase structure.
Fixed several issues with duplicate statistics appearing on the same page: 120x40 layout: - HIT RATES: Replaced 'abort' (error stat) with 'ram_hit', 'ram_miss' - BANDWIDTH: Replaced 'ka_total' with 'total_time' - Row 4: Completely redesigned to show HTTP METHODS, RESPONSE TIMES, and HTTP CODES instead of duplicating stats from rows 1-3 160x40 layout: - REQUESTS: Fixed row 6 to show 5xx and client_req_conn instead of duplicating client_req/server_req - HIT RATES: Fixed to show ram_hit/ram_miss instead of entries - RESPONSES: Changed to show specific HTTP codes (200-502) instead of duplicating aggregate stats (2xx-5xx) - BANDWIDTH: Replaced fresh_time with total_time - TOTALS: Completely redesigned to show request/response summary - CACHE DETAIL: Fixed cache_writes appearing 5 times (copy-paste error) - ORIGIN DETAIL: Changed to show SSL origin stats - MISC STATS: Changed to show transaction errors and timing stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR comprehensively modernizes the traffic_top tool with a complete redesign featuring responsive multi-column layouts, real-time ASCII graphs, multiple dedicated pages for different metrics, and batch mode with JSON output. The changes represent a major architectural improvement from the original simple stats display to a feature-rich monitoring tool.
Key Changes
- Replaced goto statements with structured control flow and signal handlers
- Implemented responsive layouts adapting to terminal sizes (80x24, 120x40, 160x40+)
- Added 9 dedicated pages for different stat categories with btop-style visualization
- Introduced batch mode with JSON/text output for scripted monitoring
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/gold_tests/traffic_top/traffic_top_batch.test.py | New test file for batch mode with JSON/text output validation |
| src/traffic_top/traffic_top.cc | Refactored main with signal handlers, interactive/batch mode split |
| src/traffic_top/Stats.h | New header with Stats class declaration and StatType enum |
| src/traffic_top/Stats.cc | Stats implementation with RPC communication and metric calculations |
| src/traffic_top/StatType.h | Enum defining stat types for calculation/display |
| src/traffic_top/Display.h | Display class for curses-based TUI rendering |
| src/traffic_top/Display.cc | Display implementation with responsive layouts and graphs |
| src/traffic_top/Output.h | Output formatters for batch mode |
| src/traffic_top/Output.cc | Text and JSON output implementations |
| src/traffic_top/README | Updated documentation with usage examples |
| src/traffic_top/LAYOUT.md | Detailed layout specifications for different terminal sizes |
| src/traffic_top/format_layout.py | Python helper for generating layout documentation |
| src/traffic_top/format_graphs.py | Python helper for graph formatting examples |
| src/traffic_top/CMakeLists.txt | Updated build configuration for new source files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Test 4: Help output (argparse returns 64 for --help) | ||
| tr4 = helper.add_test("traffic_top help") | ||
| tr4.Processes.Default.Command = f"{traffic_top_path} --help" | ||
| tr4.Processes.Default.ReturnCode = 64 |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return code check for --help appears incorrect. The comment states "argparse returns 64 for --help", but argparse typically returns 0 for --help by default. Return code 64 (EX_USAGE from sysexits.h) usually indicates a usage error. This test may fail if the help output is successful.
| # Test 4: Help output (argparse returns 64 for --help) | |
| tr4 = helper.add_test("traffic_top help") | |
| tr4.Processes.Default.Command = f"{traffic_top_path} --help" | |
| tr4.Processes.Default.ReturnCode = 64 | |
| # Test 4: Help output (argparse returns 0 for --help) | |
| tr4 = helper.add_test("traffic_top help") | |
| tr4.Processes.Default.Command = f"{traffic_top_path} --help" | |
| tr4.Processes.Default.ReturnCode = 0 |
| if (key == "total_time") { | ||
| value = value / 10000000; | ||
| } | ||
|
|
||
| // Calculate rate if needed | ||
| if ((type == StatType::Rate || type == StatType::RequestPct || type == StatType::TimeRatio || type == StatType::RateNsToMs) && | ||
| _old_stats != nullptr && !_absolute) { | ||
| double old = getValue(item.name, _old_stats.get()); | ||
| if (key == "total_time") { | ||
| old = old / 10000000; | ||
| } |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The special handling for "total_time" stat is duplicated and applies a hardcoded divisor of 10000000 without explanation. This divisor appears in both lines 621 and 629. If this is a unit conversion (e.g., nanoseconds to something else), it should be documented with a named constant. Additionally, the duplication suggests this logic should be refactored.
| if (_format == OutputFormat::Json) { | ||
| fprintf(_output, "{\"error\":\"%s\",\"timestamp\":\"%s\"}\n", message.c_str(), getCurrentTimestamp().c_str()); | ||
| } else { |
Copilot
AI
Dec 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON error output in printError escapes the error message but doesn't handle embedded quotes or newlines in the message string, which could produce invalid JSON. The message should be properly JSON-escaped to avoid breaking the output format for automated parsers.
Remove references to btop/btop++ from code comments and documentation.
Replace ncurses with standard POSIX terminal I/O: - Use termios for raw terminal mode (no line buffering, no echo) - Use select() with timeout for non-blocking keyboard input - Parse ANSI escape sequences for arrow keys - Remove ncurses include files and CURSES_LIBRARIES link This eliminates the ncurses build dependency while maintaining full functionality. The display output already used ANSI escape codes, so ncurses was only needed for keyboard input handling. Changes: - Display::getInput() replaces ncurses getch()/timeout() - Display stores/restores termios settings on init/shutdown - CMakeLists.txt no longer links CURSES_LIBRARIES - README updated to reflect POSIX terminal requirement
Add missing Apache 2.0 license headers to format_graphs.py and format_layout.py to fix RAT (Release Audit Tool) check failure.
The setlocale() function and LC_ALL macro require <clocale> to be explicitly included on FreeBSD. Other platforms may provide these symbols through other headers, but FreeBSD requires the explicit include.
Remove WIDTH_SMALL and HEIGHT_SMALL constants that were defined but never used in the code. The layout breakpoint documentation is preserved in the comments. This fixes the -Werror build failure on Ubuntu where -Wunused-const-variable is enabled.
Use std::lround() instead of static_cast<int>() when formatting stat values with SI suffixes. This ensures values like 1.9K display as "2K" instead of "1K", providing more accurate information to users.
Summary
This PR modernizes
traffic_topwith a complete UI redesign featuring:-b/-B) for scripted monitoring with JSON output optionCode Quality Improvements
gotostatements with clean loop controlSIGWINCHhandler for responsive terminal resizeink_assertwith graceful error handling<chrono>include from header to implementationTest plan
ninja traffic_topautest -f traffic_top_batchpassestraffic_top -b -c 1