race condition with socket passing cluster #8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
so, there is a race condition that can cause requests to get dropped. It's more likely to happen the fewer workers you have.
What happens is that between the connection event happening on the master server and the socket getting passed along, data might be buffered into the master server's socket stream, and then not get passed with the socket. This is likely to happen if a socket has been passed to a particular worker recently. So you don't see it much if you have big clusters and relatively low amounts of new connections. If you need a test case, create a 1 worker cluster and fire off 4 or 5 requests to some simple route, you will see a few responses and a few left hanging.
In order to fix it you have to dig into the "private" bits of net.Server and net.Socket, and essentially pass the file descriptor before node gets a chance to wrap it in the streaming interface and start buffering data. Fortunately that can be done by adding a function on Server._handle.onconnection after Server.listen is called. This will allow you to interrupt the regular net.Server process and pass the file descriptor down to the child process which can then create the new net.Socket with the fd in question.
I created this pull request to more easily show the code changes required.