Conversation
- Limits `_refresh_live` calls to ~20 FPS (0.05s interval) in `cli_interactive.py`. - Prevents CPU spikes when using fast providers (e.g. Grok, Gemini). - Verified with reproduction script showing >95% reduction in draw calls. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
🤖 Hi @milhy545, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
Summary of ChangesHello @milhy545, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a performance optimization to the interactive command-line interface. By limiting the frequency of UI updates, it significantly reduces CPU load and rendering overhead, especially when processing rapid streams of data from large language models. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a performance optimization by throttling TUI refreshes to 20 FPS during LLM streaming, which effectively reduces CPU usage. The implementation is sound. I've provided a couple of suggestions to improve the robustness and readability of the throttling logic, and also pointed out a minor UI issue where the last part of a stream might not be rendered, with a recommendation on how to fix it.
| # Bolt Optimization: Throttle UI updates to ~20 FPS to prevent | ||
| # high CPU usage when LLM streams very fast (e.g. 50+ tokens/sec). | ||
| now = time.time() | ||
| if now - last_refresh_time >= 0.05: | ||
| self._refresh_live() | ||
| last_refresh_time = now |
There was a problem hiding this comment.
This throttling is a great optimization for performance. However, it introduces a small risk: if the stream ends shortly after a refresh, the final chunks of text might not be rendered in the "Thinking" panel before it's cleared. This can make it seem like the stream was cut off.
To ensure the final streamed state is always visible to the user, I recommend adding a final self._refresh_live() call after the streaming process is complete but before the UI is cleaned up. A good place for this would be in the finally block that wraps the await self.coder.process_request(...) call, just before self.activity_panel.clear_thinking().
| now = time.time() | ||
| if now - last_refresh_time >= 0.05: | ||
| self._refresh_live() | ||
| last_refresh_time = now |
There was a problem hiding this comment.
This throttling logic is a great addition. Two small improvements could make it more robust and readable:
- Use
time.monotonic(): For measuring time intervals,time.monotonic()is more reliable thantime.time()because it's not affected by system clock changes. - Define a constant for the interval: The value
0.05is a magic number. Defining it as a named constant (e.g.,UI_REFRESH_INTERVAL) makes the code's intent clearer (targeting ~20 FPS) and easier to modify. You could define this constant at the top of theprocess_chatmethod.
| now = time.time() | |
| if now - last_refresh_time >= 0.05: | |
| self._refresh_live() | |
| last_refresh_time = now | |
| now = time.monotonic() | |
| if now - last_refresh_time >= 0.05: | |
| self._refresh_live() | |
| last_refresh_time = now |
💡 What: Implemented throttling in
InteractiveCLI's_stream_handlerto limit UI updates to every 0.05 seconds (20 FPS).🎯 Why: High-speed LLM streaming (50+ tokens/sec) caused 1:1 UI refreshes, leading to excessive CPU usage and rendering overhead.
📊 Impact: Reduces UI refresh calls by >95% during fast streaming (e.g., 50 calls -> 2 calls in 100ms test), significantly lowering CPU load.
🔬 Measurement: Verified with
tests/repro_throttling.py(mocked stream test).PR created automatically by Jules for task 7404175556088628426 started by @milhy545