-
Notifications
You must be signed in to change notification settings - Fork 2
statedb: Improve performance for large number of registered tables #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
95498c3 to
f8a2b95
Compare
| opts options | ||
| prevTxn atomic.Pointer[Txn[T]] // the previous txn for reusing the allocation | ||
| prevTxnID uint64 // the transaction ID that produced this tree | ||
| prevTxn *atomic.Pointer[Txn[T]] // the previous txn for reusing the allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising to me. I think this has three states - a nil pointer, a non-nil pointer to an atomic nil pointer or a non-nil pointer to a non-nil atomic pointer. Why do we need three?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about states it's about making sure the atomic pointer is heap allocated and shared by all the Tree timelines. prevTxn also would never be nil. The only reason we have atomic.Pointer here is because part.Map and part.Set might "fork", e.g. we might do multiple transactions against the same original tree. In those cases it's important that only one branch gets to reuse the txn (and since Tree is by value we might've copied it so it's not enough that (*Tree).Txn nil's it from the current copy). However I think we care more about the txn reuse in StateDB and less in part.Map and part.Set so I ended up dropping the atomic.Pointer and instead adding an explicit ReuseTxn option to part.New to string along the *Txn[T].
Could you take a look and see if this makes sense?
This did make part.Map about ~20% slower so will need to think if that's a reasonable trade-off or not...
EDIT: oh wait failing tests.. didn't push yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh don't like how error-prone this makes it and didn't like about the perf regression on part.Map. I also couldn't quite figure out why the change caused bunch of StateDB tests to fail while none of the part tests did (even after changing it to default to reusing). I think the heap-allocated atomic pointer still makes most sense and it's much safer and easier to reason about. E.g. it's essentially a "sync.Pool" of size 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still seeing this change so did you end up pushing the ReuseTxn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I didn't push the ReuseTxn idea. The original *atomic.Pointer[Txn[T]] change is the way to go.
Add 1 and 100 table commit-only benchmark to test for impact of lots of registered tables on the cost of [dbRoot] cloning. Signed-off-by: Jussi Maki <jussi.maki@isovalent.com>
f8a2b95 to
b20de17
Compare
This avoids allocating [part.Tree] in the heap and saves an allocation on the write path. Signed-off-by: Jussi Maki <jussi.maki@isovalent.com>
As StateDB usage grows in Cilium we start to reach the point where cloning the [dbRoot] is becoming increasingly expensive. At the cost of an additional heap allocation change to storing the tableEntry by pointer. Before: BenchmarkDB_WriteTxn_1-8 1222287 981.1 ns/op 1019299 objects/sec 944 B/op 15 allocs/op BenchmarkDB_WriteTxn_10-8 2851748 418.0 ns/op 2392425 objects/sec 499 B/op 8 allocs/op BenchmarkDB_WriteTxn_100-8 3561373 334.3 ns/op 2991648 objects/sec 485 B/op 7 allocs/op BenchmarkDB_WriteTxn_1000-8 3045235 393.8 ns/op 2539482 objects/sec 437 B/op 7 allocs/op BenchmarkDB_WriteTxn_100_SecondaryIndex-8 1502962 796.5 ns/op 1255539 objects/sec 1004 B/op 20 allocs/op BenchmarkDB_WriteTxn_CommitOnly_100Tables-8 542452 2213 ns/op 8332 B/op 4 allocs/op BenchmarkDB_WriteTxn_CommitOnly_1Table-8 2411512 498.2 ns/op 216 B/op 4 allocs/op After: BenchmarkDB_WriteTxn_1-8 1205449 994.8 ns/op 1005270 objects/sec 952 B/op 16 allocs/op BenchmarkDB_WriteTxn_10-8 2862157 416.8 ns/op 2399386 objects/sec 500 B/op 8 allocs/op BenchmarkDB_WriteTxn_100-8 3564074 333.1 ns/op 3001994 objects/sec 485 B/op 7 allocs/op BenchmarkDB_WriteTxn_1000-8 2716454 392.7 ns/op 2546569 objects/sec 437 B/op 7 allocs/op BenchmarkDB_WriteTxn_100_SecondaryIndex-8 1507063 798.2 ns/op 1252773 objects/sec 1004 B/op 20 allocs/op BenchmarkDB_WriteTxn_CommitOnly_100Tables-8 1427040 846.3 ns/op 1112 B/op 5 allocs/op BenchmarkDB_WriteTxn_CommitOnly_1Table-8 2386636 505.4 ns/op 224 B/op 5 allocs/op Signed-off-by: Jussi Maki <jussi.maki@isovalent.com>
b20de17 to
b128250
Compare
| opts options | ||
| prevTxn atomic.Pointer[Txn[T]] // the previous txn for reusing the allocation | ||
| prevTxnID uint64 // the transaction ID that produced this tree | ||
| prevTxn *atomic.Pointer[Txn[T]] // the previous txn for reusing the allocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still seeing this change so did you end up pushing the ReuseTxn?
As there's starting to be large number of StateDB tables in Cilium it makes now sense to reduce the cost of it by
storing the
tableEntryby pointer and making cloning of the root cheaper.partto return the tree by value instead of by pointer to save a heap allocation[]*tableEntryinstead of[]tableEntryto avoid large copy with many tablesBefore:
After:
Changing
partto return tree by value shaved off an allocation in theWriteTxn_1case. StoringtableEntryby pointer reduced theWriteTxn+Commitcost from 2170ns/op to 840ns/op without impacting write throughput.