XD-3751 Fix gpfdist processor shutdown by jvalkeal · Pull Request #1901 · spring-attic/spring-xd

jvalkeal · 2016-03-03T14:06:32Z

Adding a workaround for a problem in reactor 2.0.x where
onNext and onComplete will result deadlock if ringbuffer
is full.
We now try to let gpdb load session to drain stream and detect
if that succeed by checking buffer size and what's a remaining
capacity. If we can't drain, last possible option is to
force processor shutdown.

- Adding a workaround for a problem in reactor 2.0.x where onNext and onComplete will result deadlock if ringbuffer is full. - We now try to let gpdb load session to drain stream and detect if that succeed by checking buffer size and what's a remaining capacity. If we can't drain, last possible option is to force processor shutdown.

- spring-attic/spring-xd#1901

markpollack · 2016-03-30T15:26:08Z

...on-gpfdist/src/main/java/org/springframework/xd/greenplum/gpfdist/GPFDistMessageHandler.java

+		boolean drained = false;
 		if (greenplumLoad != null) {
+
+			// xd waits 30s to shutdown module, so lets wait 25 to drain


Can you explain this a bit more - sounds like some sort of race condition, shouldn't we let the entire buffer drain since the messages in the buffer have been ack'd (say in rabbit) ?

This deadlock within a reactor is something which exists in 2.0.x, although it fixed in 2.5. Effectively when trying to shutdown a processor, signal is sent into a downstream indicating its complete but if there's existing messages in a ringbuffer, that terminate signal never reach a correct component in a reactor because we already stopped draining. Module shutdown timeout is afaik hardcoded to 30 secs in XD and after that things go a bit haywire if module is not actually properly closed.

This was a workaround I came out with discussion with stephane. It rely on a fact that we try to keep the load operations running little less time when XD would throw errors that it's unable to shutdown a module. We're hoping that these load operations will eventually drain the buffers and allows terminate signal to go down stream, thus allowing processor to shutdown and thus allow clean shutdown of a module.

caxqueiroz pushed a commit to caxqueiroz/jdbcgpfdist that referenced this pull request Mar 28, 2016

apply fix 1901

6524e72

- spring-attic/spring-xd#1901

markpollack self-assigned this Mar 29, 2016

markpollack reviewed Mar 30, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XD-3751 Fix gpfdist processor shutdown#1901

XD-3751 Fix gpfdist processor shutdown#1901
jvalkeal wants to merge 1 commit intospring-attic:masterfrom
jvalkeal:XD-3751

jvalkeal commented Mar 3, 2016

Uh oh!

markpollack Mar 30, 2016

Uh oh!

jvalkeal Mar 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

jvalkeal commented Mar 3, 2016

Uh oh!

markpollack Mar 30, 2016

Choose a reason for hiding this comment

Uh oh!

jvalkeal Mar 30, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants