Copying 300MB weights parameters (one expert of mixtral-8x7b) from cpu to gpu requiring 50ms indicates that the PCIe bandwidth is only 0.3GB/50ms = 6GB/s, which is much slower than the reported L4 gpu's PCIe bandwidth (PCIe Gen4 x16 64GB/s) in https://www.nvidia.com/en-us/data-center/l4/ , is there any explanation about it? Thanks.