We train very deep poly net with 10 GPUs and 30-batch-size. It leverages very slowly (almost no leverage). Is there any solution?