Skip to content

Add scripts for benchmarking opt_einsum as well as manyinds benchmarks#3

Open
kshyatt wants to merge 1 commit intounder-Peter:masterfrom
kshyatt:ksh/manyinds
Open

Add scripts for benchmarking opt_einsum as well as manyinds benchmarks#3
kshyatt wants to merge 1 commit intounder-Peter:masterfrom
kshyatt:ksh/manyinds

Conversation

@kshyatt
Copy link

@kshyatt kshyatt commented Sep 21, 2021

Currently we don't benchmark computations like this with many indices. I add benchmarks for these (see also the PR to OMEinsum.jl) and scripts for benchmarking the Python library opt_einsum.

@GiggleLiu
Copy link
Collaborator

GiggleLiu commented Sep 21, 2021

Thanks for the PR!
Regarding to the following contraction pattern in which OMEinsum performs bad,

julia> code = ein"abcdefghijklmnop,flnqrcipstujvgamdwxyz->bcdeghkmnopqrstuvwxyz"
abcdefghijklmnop, flnqrcipstujvgamdwxyz -> bcdeghkmnopqrstuvwxyz

julia> OMEinsum.timespace_complexity(code, uniformsize(code, 2))
(26.0, 21.0)

julia> @btime code(x, y);
  24.924 ms (111 allocations: 48.51 MiB)

This is because OMEinsum uses the permutedims + reshape + matmul to perform tensor contraction. While in this pattern, the permutedims function takes 90% of the time because time and space complexity are similar. If we switch to TensorOperations.tensorcopy, the time can be halved.

julia> using LinearAlgebra, TensorOperations

julia> LinearAlgebra.permutedims(a::Array{T,N}, perm::NTuple{N}) where {T,N} = (TensorOperations.tensorcopy(a, collect(1:ndims(a)), perm))

julia> LinearAlgebra.permutedims!(o::Array{T}, a::Array{T}, perm::Vector) where T = (TensorOperations.tensorcopy!(a, collect(1:ndims(a)), o, perm))

julia> @btime code(x, y);
  13.353 ms (398 allocations: 48.54 MiB)

I think the best way to sovle this issue is to remove permutedims completely.
However, it is not easy to write a general contraction function with BLAS performance. Wondering what is the performance gap between OMEinsum and other packages in this benchmarking case?

@GiggleLiu
Copy link
Collaborator

Great job, wondering how the julia-gpu data is generated, I can not find the corresponding script anywhere.

And unfortunately, I do not have the write access to this repo. @under-Peter Can you please give me a write access or help merge this PR.

@under-Peter
Copy link
Owner

@GiggleLiu is it enough to add you as a collaborator or do i have to give you write access separately? Thanks so much and sorry for being short on time atm.

@GiggleLiu
Copy link
Collaborator

Now I have the write access, thanks for reacting so fast, @under-Peter.

@kshyatt
Copy link
Author

kshyatt commented Sep 22, 2021

Is this good to be merged?

@GiggleLiu
Copy link
Collaborator

GiggleLiu commented Sep 22, 2021

It looks good, but I want to make sure the benchmark on GPU is correct (especially the manyindex case), it is different from the running the case on my own host. Can you please show me the script generating the result?

After that, it should be good to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants