Skip to content

Conversation

@SQUARE-WAVES
Copy link

so, there is a race condition that can cause requests to get dropped. It's more likely to happen the fewer workers you have.

What happens is that between the connection event happening on the master server and the socket getting passed along, data might be buffered into the master server's socket stream, and then not get passed with the socket. This is likely to happen if a socket has been passed to a particular worker recently. So you don't see it much if you have big clusters and relatively low amounts of new connections. If you need a test case, create a 1 worker cluster and fire off 4 or 5 requests to some simple route, you will see a few responses and a few left hanging.

In order to fix it you have to dig into the "private" bits of net.Server and net.Socket, and essentially pass the file descriptor before node gets a chance to wrap it in the streaming interface and start buffering data. Fortunately that can be done by adding a function on Server._handle.onconnection after Server.listen is called. This will allow you to interrupt the regular net.Server process and pass the file descriptor down to the child process which can then create the new net.Socket with the fd in question.

I created this pull request to more easily show the code changes required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant