Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #612 +/- ##
==========================================
- Coverage 80.76% 75.78% -4.99%
==========================================
Files 61 68 +7
Lines 2693 2907 +214
==========================================
+ Hits 2175 2203 +28
- Misses 518 704 +186 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| #default_storage = "private" | ||
|
|
||
| # when false, always use metal 3 even on a metal 4 platform | ||
| #force_metal3 = false |
There was a problem hiding this comment.
what about a api_version preference, determined automatically if unset, but overridable by setting "3" or "4"?
I'd also use string values; literals have given me issues in the past (if there's a typo in the user-provided TOML, stuff breaks).
There was a problem hiding this comment.
Noted about the string values.
Metal 3 can be used alongside Metal 4, so maybe we should call it default_api_version to make it clear that all it's doing is selecting the default API?
I think maybe that this preference should only affect whether global_queue returns an MTLCommandQueue or an MTL4CommandQueue, and everything else that takes in a queue should dispatch on the correct implementation based on which type was provided?
| # command allocator | ||
| # | ||
|
|
||
| export MTL4CommandAllocator |
There was a problem hiding this comment.
How large is the Metal 4 API? Should we put it in a mtl4 top-level folder?
There was a problem hiding this comment.
A lot of the Metal 4 API lives as new methods of Metal 3 Objects so maybe a mtl4 subfolder in lib/mtl would be more appropriate?
| # Copy Operations (Blit functionality integrated into compute encoder in Metal 4) | ||
| # function append_copy!(cce::MTL4ComputeCommandEncoder, dst::MTLBuffer, dstOffset::Integer, | ||
| # src::MTLBuffer, srcOffset::Integer, size::Integer) | ||
| # @objc [cce::id{MTL4ComputeCommandEncoder} copyFromBuffer:src::id{MTLBuffer} | ||
| # sourceOffset:srcOffset::NSUInteger | ||
| # toBuffer:dst::id{MTLBuffer} | ||
| # destinationOffset:dstOffset::NSUInteger | ||
| # size:size::NSUInteger]::Nothing | ||
| # end | ||
|
|
||
| # function append_copy!(cce::MTL4ComputeCommandEncoder, dst::MTLTexture, dstSlice::Integer, dstLevel::Integer, dstOrigin::MTLOrigin, | ||
| # src::MTLBuffer, srcOffset::Integer, srcBytesPerRow::Integer, srcBytesPerImage::Integer, | ||
| # size::MTLSize) | ||
| # @objc [cce::id{MTL4ComputeCommandEncoder} copyFromBuffer:src::id{MTLBuffer} | ||
| # sourceOffset:srcOffset::NSUInteger | ||
| # sourceBytesPerRow:srcBytesPerRow::NSUInteger | ||
| # sourceBytesPerImage:srcBytesPerImage::NSUInteger | ||
| # sourceSize:size::MTLSize | ||
| # toTexture:dst::id{MTLTexture} | ||
| # destinationSlice:dstSlice::NSUInteger | ||
| # destinationLevel:dstLevel::NSUInteger | ||
| # destinationOrigin:dstOrigin::MTLOrigin]::Nothing | ||
| # end |
There was a problem hiding this comment.
No they exist with the new MTL4ComputerCommandEncoder. I just wanted to lay the groundwork for custom kernels and then one piece of functionality using Metal 4 (unsafe_fill! is what I landed on). This code hasn't been tested or anything but once synchronization is figured out I'll work on the rest of the Metal 4 MtlArray functionality.
|
|
||
| @autoreleasepool function (kernel::HostKernel)(args...; groups=1, threads=1, | ||
| queue=global_queue(device())) | ||
| queue=use_metal4() ? global_queue4(device()) : global_queue(device())) |
There was a problem hiding this comment.
global_queue4 is weird, and I'd rather not have to do this dance everywhere. I'd rather we can use these regardless of the API version, so what about having global_queue(::MTLDevice) return an appropriate object given the selected API? Given that preferences are determined at compile time, that should also be type stable.
src/compiler/execution.jl
Outdated
| if !use_mtl4 | ||
| roots = [kernel.f, args] | ||
| MTL.on_completed(cmdbuf) do buf | ||
| empty!(roots) | ||
| foreach(free, argument_buffers) | ||
|
|
||
| # Check for errors | ||
| # XXX: we cannot do this nicely, e.g. throwing an `error` or reporting with `@error` | ||
| # because we're not allowed to switch tasks from this contexts. | ||
| if buf.status == MTL.MTLCommandBufferStatusError | ||
| Core.println("ERROR: Failed to submit command buffer: $(buf.error.localizedDescription)") | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
We'll need an equivalent for this, or Julia can early-free resources.
Related:
Unlike the default behavior of MTLCommandBuffer, you may need to consider a resource’s retain count because each MTL4CommandBuffer instance doesn’t create strong references to resources. This is similar to creating an MTLCommandBuffer with the makeCommandBufferWithUnretainedReferences() method of an MTLCommandQueue.
| cce = MTLComputeCommandEncoder(cmdbuf) | ||
| if use_mtl4 | ||
| allocator = MTL4CommandAllocator(device()) | ||
| cmdbuf = MTL4CommandBuffer(device()) |
There was a problem hiding this comment.
You can reuse and repurpose each command buffer indefinitely by starting over, encoding new commands, and committing it again, instead of allocating a new buffer.
As a future TODO, we could probably use a task-local command buffer just like we have task-local queues now.
| const use_metal4 = OncePerProcess{Bool}() do | ||
| dev = device() | ||
| force_metal3 = @load_preference("force_metal3", false) | ||
| return functional() && !force_metal3 && supports_family(dev, MTL.MTLGPUFamilyMetal4) | ||
| end |
There was a problem hiding this comment.
I'd make this api_version::Int then, parsing the api_version String preferences here.
src/state.jl
Outdated
| device!(dev::MTLDevice) = task_local_storage(:MTLDevice, dev) | ||
|
|
||
| const global_queues = WeakKeyDict{MTLCommandQueue,Nothing}() | ||
| const global_queues4 = WeakKeyDict{MTL4CommandQueue,Nothing}() |
There was a problem hiding this comment.
With the preference being compile-time determined, we don't need multiple of these.
If necessary (e.g. for type definitions or macro expansions), you can use @static if. Otherwise a plain top-level if api_version == ... should work.
Update as of 07/29 (also added to todo):
MTLSharedEvents (Synchronize usingMTLSharedEvents #633).To actually test the Metal 4 stuff, you need to be on the macOS 26 developer beta, and be using ObjectiveC.jl >
v3.4.2For 8-bit integer fills, it works for smaller vectors, but not for bigger ones. I think it's because I can't figure out how to do synchronization like we do with Metal3.
For custom kernels, I don't know if the only issue is synchronization or if there's something else that's broken.
TODO:
MTLSharedEvents (Synchronize usingMTLSharedEvents #633)I'm kind of leaving this here for feedback/suggestions/help.