HDDS-14239. Simplify Prepare Batch looping for DeleteRange Op processing in RDBBatchOperation #9553
+1,103
−251
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The PrepareBatchOperation looping was very convoluted in HDDS-13415 (#8774) implementation. It also misses a case where a putKey/deleteKey can get added even though a deleteRange has been executed in the next batch after the following continuousDeleteRange batch. The following example will explain the scenario better.
Put Key1
DeleteRange Key2 - Key5
Put Key2
DeleteRange Key1 - Key4
Here the operation 1 should ideally be cancelled by Op4. But currently both Op1 & OP4 gets executed however it would be more optimal to just execute Op4 & Op1 is redundant.
Initially while implementing the deleteRangeWithBatch I had plans to optimize this the deleteRange search by sorting the continuous ranges and do a binary search to figure out the range matching the entry to reduce overall complexity. However we saw the code was getting more complex and not giving us much returns since deleteRange as an op was going to be infrequent and was only going to be used by snapshot create.
But the deleteRange is also going to be used by DirectoryPurgeRequest removing entries from the file table and directory table. This optimization would help reduce complexity of rocksdb compactions if a key is committed and also deleted within the same double buffer batch.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14239
How was this patch tested?
Updated unit tests. I intend to add another unit test suggested in this comment
#8774 (comment)