Skip to content

Conversation

@ericliang
Copy link

Ekaf will store the prepare request to dict and would response it when got worker-up messages, in my production case with many Kafka partitions, it needs to wait so long for the worker-up messages that will reach the timeout and exit the caller process. And also, it have a little chance to miss the worker-up messages since ekaf_server state change logic is separated with process of worker-up message.

First of all, I've changed the prepare process to an instant manner as there is a pick operation when producing sync messages on non-prepared topic.

Then I've added three trivial features, one for operation friendliness which can purge messages in case too many messages buffered in memory, one for fast recovery on kafka cluster restart or network problem which will timeout on connection, one bug fix on restart worker which will lead to twofold reconnection on each connection failure.

We've run this version in production environment for about one month, and I guess it's time to send them back. HTH.

@ghoseb
Copy link

ghoseb commented May 18, 2015

@ericliang This looks like a great patch! Could you please split them into different PRs? This would really help in reviewing the code. Thanks.

@bosky101
Copy link
Contributor

Appreciate the contribution @ericliang . I've created a new branch where I can review and merge this https://github.com/helpshift/ekaf/tree/feature/prepare-timeout

If you push to the new branch prepare-timeout I can:

1) remove some logging you've added
2) fix some formatting/indenting 
3) add tests for new features
4) add documentation to readme about new flags

Once I've merged & made the above changes, it will be better prepared to merge back into master.

Thanks again, btw - would love if you can describe the scale and where you've used it in production.

~B

@ericliang
Copy link
Author

Thanks @ghoseb @bosky101 for your comments, I've created new request on prepare-timeout branch, please check it #14

For your information, we are an IM cloud service and using kafka for offline messages push and other user status change cases, 3 topics with 30 partitions each. Which means, 900 kafka connections are managed by ekaf with 10 workers per partition configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants