During v2.0, Jon Flynn reported a whole set of things that DLIO was doing that seemed to be not required by the benchmark and that consumed a lot of CPU cycles on the client. He said that with these changes, he got 3x higher benchmark scores on the same client hardware.
I went from:
Base Code: ~7 H100s passing
Removed CRC: ~11.5 H100s passing
Final Code: 24 passing at 95%
all in Buffered mode
The O_DIRECT pathway avoids the CRC check as well, which is one reason its faster.
