XD-3751 Fix gpfdist processor shutdown#1901
XD-3751 Fix gpfdist processor shutdown#1901jvalkeal wants to merge 1 commit intospring-attic:masterfrom
Conversation
- Adding a workaround for a problem in reactor 2.0.x where onNext and onComplete will result deadlock if ringbuffer is full. - We now try to let gpdb load session to drain stream and detect if that succeed by checking buffer size and what's a remaining capacity. If we can't drain, last possible option is to force processor shutdown.
| boolean drained = false; | ||
| if (greenplumLoad != null) { | ||
|
|
||
| // xd waits 30s to shutdown module, so lets wait 25 to drain |
There was a problem hiding this comment.
Can you explain this a bit more - sounds like some sort of race condition, shouldn't we let the entire buffer drain since the messages in the buffer have been ack'd (say in rabbit) ?
There was a problem hiding this comment.
This deadlock within a reactor is something which exists in 2.0.x, although it fixed in 2.5. Effectively when trying to shutdown a processor, signal is sent into a downstream indicating its complete but if there's existing messages in a ringbuffer, that terminate signal never reach a correct component in a reactor because we already stopped draining. Module shutdown timeout is afaik hardcoded to 30 secs in XD and after that things go a bit haywire if module is not actually properly closed.
This was a workaround I came out with discussion with stephane. It rely on a fact that we try to keep the load operations running little less time when XD would throw errors that it's unable to shutdown a module. We're hoping that these load operations will eventually drain the buffers and allows terminate signal to go down stream, thus allowing processor to shutdown and thus allow clean shutdown of a module.
onNext and onComplete will result deadlock if ringbuffer
is full.
if that succeed by checking buffer size and what's a remaining
capacity. If we can't drain, last possible option is to
force processor shutdown.