Aggressive compaction strategy #32
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello all,
We use
riak_searchfor 3 years. Andmerge_indexsuffers of very poor performance in high write load environment.This is mainly due to the compaction strategy which is not sufficient to keep the number of segment stable when there is a high throughput and you need to decrease
buffer_rollover_sizein order to keep buffers ETS table memory stable. And when the number of segment grows too much, all the merge_index server become unusable because it needs to keep track of locks per segments.So this week end I managed to rewrite merge_index compaction strategy. It was first just a test but regarding the performance I observed since, the result is much much better than the previous merge_index. And all the issues that we faced with
riak_searchare solved because they were direct or indirect consequences of bad write performance of merge_index.To do that so quickly, I only copied Cassandra :
I give to the new parameters involved the name of Cassandra equivalents :
segment_similarity_ratiodefines the ratio of size wich will be used to group segments of similar size to compact (default to 50%, so a segment is in a group if0,5*avg_group_seg_size < seg_size < 1,5*avg_group_seg_size.min_segment_sizedefine the minimum segment size which is targeted by segment grouping for compaction.compaction_throughput_mb_per_secdefines the throughput which will adjust throttling of compactionI understand Riak Search is not maintained anymore because of Riak2.0 upcoming. But Riak Search fits well a particular use case : when you need a simple full text engine with a term based distribution, for instance as a building block for another kind of search. This is the case in the company I have founded, even if SOLR/Yokozuna migration could be a possibility, it is not an easy one for us and I expect it could be the case for other companies.
So please take my pull request into consideration, because we cannot use Riak Search without this improvement and I expect that many of your users had the same issues and that this fix would help them. SOLR/Yokozuna are great but very different from Riak Search which is great in its field. And we would love to keep code upstream even if merge_index is not maintained anymore.
Best regards.
Arnaud.