Hi, great work! I am wondering to achieve 300GB/s compression throughput (from the figure on the readme), how many SMs are we expected to use? Thanks!