Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/intrinsics.jl b/test/intrinsics.jl
index 68fa9e48..d27de5e9 100644
--- a/test/intrinsics.jl
+++ b/test/intrinsics.jl
@@ -36,10 +36,10 @@ function shfl_down_test_kernel(a, b)
value = temp[idx]
value = value + KI.shfl_down(value, 16)
- value = value + KI.shfl_down(value, 8)
- value = value + KI.shfl_down(value, 4)
- value = value + KI.shfl_down(value, 2)
- value = value + KI.shfl_down(value, 1)
+ value = value + KI.shfl_down(value, 8)
+ value = value + KI.shfl_down(value, 4)
+ value = value + KI.shfl_down(value, 2)
+ value = value + KI.shfl_down(value, 1)
b[idx] = value
end
@@ -152,7 +152,7 @@ function intrinsics_testsuite(backend, AT)
dev_a = AT(a)
dev_b = AT(zeros(T, 32))
- KI.@kernel backend() workgroupsize=32 shfl_down_test_kernel(dev_a, dev_b)
+ KI.@kernel backend() workgroupsize = 32 shfl_down_test_kernel(dev_a, dev_b)
b = Array(dev_b)
@test sum(a) ≈ b[1] |
17c7234 to
ac11a2f
Compare
|
So the backends that I am worried about is Metal and to a lesser extend Intel. |
test/intrinsics.jl
Outdated
| # This is not valid | ||
| idx = KI.get_local_id().x | ||
|
|
||
| temp = KI.localmemory(eltype(b), 32) |
There was a problem hiding this comment.
So we need a query function to find the subgroup size? Then pass that to a Val?
There was a problem hiding this comment.
Currently this is like #559 where it assumes that subgroup size is always 32.
There was a problem hiding this comment.
The "This is not valid" is because it's using the local_id but we could do like #559 and modulo 32 to find subgroup position and stuff
There was a problem hiding this comment.
So I think AMD has some chips where subgroup size is 64. So we should have some way for the use to query this (even if it is just on the host)
There was a problem hiding this comment.
| GPU Backend | Host-Side Method | Device-Side Method (Intrinsic) |
|---|---|---|
| Metal | thread_execution_width property of MTLComputePipelineState (need compiled kernel) |
[[threads_per_simdgroup() |
| AMDGPU | wavefrontsize(dev::HIPDevice) |
wavefrontsize() |
| CUDA | warpsize(dev::CuDevice) |
warpsize() |
| OpenCL | get_sub_group_size() |
|
| oneAPI | get_sub_group_size()? |
There was a problem hiding this comment.
Is Metal the only backend that currently lacks dynamic local memory?
There was a problem hiding this comment.
On OpenCL and oneAPI, the host side methods are probably CL_DEVICE_SUB_GROUP_SIZES_INTEL + clDeviceInfo and subGroupSizes + zeDeviceGetComputeProperties, respectively
refs:
OpenCL extension doc
Intel levelZero docs
pocl cuda driver
What are your worries? |
|
For the sake of maintaining sanity, lest undocumented behaviour run amok, should this PR eventually be merged then would it be possible to explicitly specify whether the behaviour is synchronising or not? I understand that this is still very much the early stages of some work-in-progress but this code segment from the test suite indicates assumed synchronicity -- otherwise, it is perfectly legal behaviour for this reduction to induce a race condition -- but the juxtaposition with vendor nomenclature suggests the contrary. value = value + KI.shfl_down(value, 16)
value = value + KI.shfl_down(value, 8)
value = value + KI.shfl_down(value, 4)
value = value + KI.shfl_down(value, 2)
value = value + KI.shfl_down(value, 1) |
|
@Hamiltonian-Action Thanks for the comment, I’ll make sure to fix the tests when I return to this. |
|
The KernelAbstractions tests are not the principal concern, given that they are internal to the package itself. Rather, my comment was more towards explicitly specifying the assumed synchronicity or lack thereof at the level of the API rather than leaving it up to the individual backends. In essence, one of the following additions to the docstring and specification would be encouraged depending on whether |
|
Noted. The idea is that it should follow the backend conventions so I’ll make sure to go through the documentation and ensure that behaviour is consistent and documented |
530821a to
6852410
Compare
This reverts commit 956dc2e.
Co-Authored-By: Anton Smirnov <tonysmn97@gmail.com>
6852410 to
84730d2
Compare
|
Closing as #668 is required to do right so I integrated this with the PR |
Requires #668
This may be all that's needed?Could maybe add simdgroup (warps, subgroups) indexing intrinsics but I'd have to check if every backend supports this (I assume they would?)