Try to implement and test CVPR 2019 paper "" in PyTorch.
It semms that the split op will obviously slow down the speed of running. And the serial way determines that it is difficult to accelerate.
Maybe there will be some special acceleration methods in the code that the author will release later.