Two things to do: 1. Prefetch as many data as possible in CPU memory. 2. Launch another CPU thread to prefetch next-batch data in the background.