Skip to content
Discussion options

You must be logged in to vote

Hi @CrreeL , we do not necessarily think it should be faster than Thrust, but it shouldn't be slower. Both should use the same constructs under the hood, but there are cases where I would expect it to be the fastest.

As far as benchmarking, make_tensor by default uses managed memory, and your first iteration of the loop with page the memory in and out of the GPU, which has a large negative contribution to the runtime. There are other reasons for the slowness on the first loop too, and in general, you should never take the first iteration as part of your measurements. For simple element-wise computations like you're doing there are likely only two sources of long time penalties:

  1. Managed m…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
3 replies
@CrreeL
Comment options

@CrreeL
Comment options

@cliffburdick
Comment options

Answer selected by CrreeL
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants