Skip to content

Conversation

@bryancall
Copy link
Contributor

The xdebug probe-full-json transform was calling TSContCall(WRITE_READY) even when no data was available to consume (towrite == 0). This caused unnecessary callbacks to the transform, and under certain conditions could lead to a tight loop that would starve other transactions on the same ET_NET thread.

The Bug

When using X-Debug: probe-full-json=nobody (or other probe-full-json variants), if the origin server was slow to send body data, the transform would:

  1. Be called with WRITE_READY event
  2. See no data available (towrite == 0)
  3. Still call TSVIOReenable and TSContCall(WRITE_READY) back to upstream
  4. Get called again immediately (step 1)

This creates a tight loop that can cause:

  • High CPU usage on the affected ET_NET thread
  • Starvation of other transactions on the same thread
  • Requests appearing to "hang"

The Fix

Add a check to only call TSVIOReenable and TSContCall when towrite > 0, meaning we actually consumed data. If no data was available, we simply return and wait for the upstream to call us again when more data arrives.

Testing

Added a new autest (x_probe_full_json_slow_origin) that uses a slow origin server with chunked encoding to verify the transform handles delayed body data correctly.

The xdebug probe-full-json transform was calling TSContCall(WRITE_READY)
even when no data was available to consume (towrite == 0). This caused
unnecessary callbacks to the transform, and under certain conditions
could lead to a tight loop that would starve other transactions on the
same ET_NET thread.

The fix adds a check to only call TSVIOReenable and TSContCall when
towrite > 0, meaning we actually consumed data. If no data was available,
we simply return and wait for the upstream to call us again when more
data arrives.

Added a new autest (x_probe_full_json_slow_origin) that uses a slow
origin server with chunked encoding to verify the transform handles
delayed body data correctly.
@bryancall bryancall added this to the 10.2.0 milestone Dec 15, 2025
@bryancall bryancall self-assigned this Dec 15, 2025
@bryancall bryancall added Bug xdebug xdebug plugin labels Dec 15, 2025
@bryancall bryancall requested a review from Copilot December 15, 2025 20:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in the xdebug plugin's probe-full-json transform where it would create a tight loop when the origin server was slow to send body data. The transform was incorrectly calling TSContCall(WRITE_READY) even when no data was consumed (towrite == 0), causing high CPU usage and potential starvation of other transactions.

Key Changes:

  • Added a conditional check to only reenable and notify upstream when data has been consumed (towrite > 0)
  • Added comprehensive test coverage with a slow origin server simulation to verify the fix

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
plugins/xdebug/xdebug_transforms.cc Added conditional check to prevent unnecessary callbacks when no data is consumed
tests/gold_tests/pluginTest/xdebug/x_probe_full_json_slow_origin/x_probe_full_json_slow_origin.test.py New test to verify the transform handles delayed body data correctly
tests/gold_tests/pluginTest/xdebug/x_probe_full_json_slow_origin/slow-body-server.sh Custom slow server implementation to simulate delayed chunked response
tests/gold_tests/pluginTest/xdebug/x_probe_full_json_slow_origin/verify_no_loop.sh Log analysis script to detect infinite loop behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +58 to +60
if [[ $consumed_count -ne 2 ]]; then
echo "WARNING: Expected 2 consumed lines but found $consumed_count"
fi
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script issues a warning when consumed_count != 2 but still exits with success (exit 0). This could mask issues where the transform behavior deviates from expectations. Consider either failing the test when this condition occurs, or removing the check if it's not critical to the test's purpose.

Copilot uses AI. Check for mistakes.
Comment on lines +299 to +302
if (towrite > 0) {
// Only reenable and notify upstream if we consumed data.
// If no data was available, just return and wait for the
// upstream to call us again when more data arrives.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this "what" comment with a "why" comment? This comment says what the code is doing (not calling reenable unless we have consumed data), but that is rather obvious. Can we reword this to explain that the reason we do this is to address the transform loop described in the PR comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug xdebug xdebug plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants