Fix prepare timeout problem which will exit the caller and some other changes #13

ericliang · 2015-05-18T06:46:14Z

Ekaf will store the prepare request to dict and would response it when got worker-up messages, in my production case with many Kafka partitions, it needs to wait so long for the worker-up messages that will reach the timeout and exit the caller process. And also, it have a little chance to miss the worker-up messages since ekaf_server state change logic is separated with process of worker-up message.

First of all, I've changed the prepare process to an instant manner as there is a pick operation when producing sync messages on non-prepared topic.

Then I've added three trivial features, one for operation friendliness which can purge messages in case too many messages buffered in memory, one for fast recovery on kafka cluster restart or network problem which will timeout on connection, one bug fix on restart worker which will lead to twofold reconnection on each connection failure.

We've run this version in production environment for about one month, and I guess it's time to send them back. HTH.

…ts ready

ghoseb · 2015-05-18T08:12:35Z

@ericliang This looks like a great patch! Could you please split them into different PRs? This would really help in reviewing the code. Thanks.

bosky101 · 2015-05-18T09:39:28Z

Appreciate the contribution @ericliang . I've created a new branch where I can review and merge this https://github.com/helpshift/ekaf/tree/feature/prepare-timeout

If you push to the new branch prepare-timeout I can:

1) remove some logging you've added
2) fix some formatting/indenting 
3) add tests for new features
4) add documentation to readme about new flags

Once I've merged & made the above changes, it will be better prepared to merge back into master.

Thanks again, btw - would love if you can describe the scale and where you've used it in production.

~B

ericliang · 2015-05-18T13:37:26Z

Thanks @ghoseb @bosky101 for your comments, I've created new request on prepare-timeout branch, please check it #14

For your information, we are an IM cloud service and using kafka for offline messages push and other user status change cases, 3 topics with 30 partitions each. Which means, 900 kafka connections are managed by ekaf with 10 workers per partition configuration.

ericliang added 7 commits April 9, 2015 14:18

prepare will run in an idempotent mode

7be453e

will reply never but later since no worker-up message after server ge…

bac46d5

…ts ready

no need to reply to prepares during downtime

02fa0ab

purge messages in case too many messages buffered in memory

756b733

should not start new timer if connect message was sent

19eda2f

add connect timeout to avoid reconnect action overflow

2b2082e

fix bug on return type of info

8ec0619

ericliang mentioned this pull request May 18, 2015

Fix prepare timeout problem which will exit the caller and some other changes #14

Merged

ericliang added 3 commits February 27, 2016 00:14

add new process group module

63f4600

change process module from pg2 to pg2l

a227e5f

give more time to broker starter

ecd5ea5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix prepare timeout problem which will exit the caller and some other changes #13

Fix prepare timeout problem which will exit the caller and some other changes #13

Uh oh!

ericliang commented May 18, 2015

Uh oh!

ghoseb commented May 18, 2015

Uh oh!

bosky101 commented May 18, 2015

Uh oh!

ericliang commented May 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix prepare timeout problem which will exit the caller and some other changes #13

Are you sure you want to change the base?

Fix prepare timeout problem which will exit the caller and some other changes #13

Uh oh!

Conversation

ericliang commented May 18, 2015

Uh oh!

ghoseb commented May 18, 2015

Uh oh!

bosky101 commented May 18, 2015

Uh oh!

ericliang commented May 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants