Metal 4 by christiangnrd · Pull Request #612 · JuliaGPU/Metal.jl

christiangnrd · 2025-06-23T23:32:52Z

Update as of 07/29 (also added to todo):

I believe synchronization must be done using MTLSharedEvents (Synchronize using MTLSharedEvents #633).
Kernels return zeros, I don't know when I'll have the chance to return to this, but for anyone interested in contributing, my next attempt would be looking into residency sets. To make it easier, this could be done by first using residency sets in Metal 3 on macOS > 15, and then converting to Metal 4 could be easier.

To actually test the Metal 4 stuff, you need to be on the macOS 26 developer beta, and be using ObjectiveC.jl > v3.4.2

For 8-bit integer fills, it works for smaller vectors, but not for bigger ones. I think it's because I can't figure out how to do synchronization like we do with Metal3.

julia> using Metal; Metal.fill(UInt8(4), 10000)
[ Info: 4
10000-element MtlVector{UInt8, Metal.PrivateStorage}:
 0x04
 0x04
    ⋮
 0x04
 0x04

julia> using Metal; Metal.fill(UInt8(4), 100000)
[ Info: 4
100000-element MtlVector{UInt8, Metal.PrivateStorage}:
 0x00
 0x00
    ⋮
 0x00
 0x00

For custom kernels, I don't know if the only issue is synchronization or if there's something else that's broken.

julia> using Metal; a = mtl([1,2,3,4]); a .+ 1
4-element MtlVector{Int64, Metal.PrivateStorage}:
 0
 0
 0
 0

TODO:

Synchronization with MTLSharedEvents (Synchronize using MTLSharedEvents #633)
Get kernels working. (maybe residency sets?)
Tests

I'm kind of leaving this here for feedback/suggestions/help.

codecov · 2025-06-23T23:44:27Z

Codecov Report

Attention: Patch coverage is 22.31405% with 188 lines in your changes missing coverage. Please review.

Project coverage is 75.78%. Comparing base (1e1e193) to head (34e5dea).

Files with missing lines	Patch %	Lines
lib/mtl/command_queue4.jl	0.00%	39 Missing ⚠️
lib/mtl/command_buf4.jl	0.00%	28 Missing ⚠️
lib/mtl/command_alloc4.jl	0.00%	27 Missing ⚠️
lib/mtl/command_enc/compute4.jl	0.00%	27 Missing ⚠️
src/state.jl	27.58%	21 Missing ⚠️
lib/mtl/arg_table.jl	0.00%	16 Missing ⚠️
src/compiler/execution.jl	61.76%	13 Missing ⚠️
lib/mtl/command_enc4.jl	0.00%	8 Missing ⚠️
lib/mtl/compute_pipeline4.jl	0.00%	5 Missing ⚠️
src/memory.jl	78.94%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #612      +/-   ##
==========================================
- Coverage   80.76%   75.78%   -4.99%     
==========================================
  Files          61       68       +7     
  Lines        2693     2907     +214     
==========================================
+ Hits         2175     2203      +28     
- Misses        518      704     +186

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maleadt · 2025-06-24T06:36:45Z

LocalPreferences.toml

 #default_storage = "private"
+
+# when false, always use metal 3 even on a metal 4 platform
+#force_metal3 = false


what about a api_version preference, determined automatically if unset, but overridable by setting "3" or "4"?

I'd also use string values; literals have given me issues in the past (if there's a typo in the user-provided TOML, stuff breaks).

Noted about the string values.

Metal 3 can be used alongside Metal 4, so maybe we should call it default_api_version to make it clear that all it's doing is selecting the default API?

I think maybe that this preference should only affect whether global_queue returns an MTLCommandQueue or an MTL4CommandQueue, and everything else that takes in a queue should dispatch on the correct implementation based on which type was provided?

maleadt · 2025-06-24T06:39:15Z

lib/mtl/command_alloc4.jl

+# command allocator
+#
+
+export MTL4CommandAllocator


How large is the Metal 4 API? Should we put it in a mtl4 top-level folder?

A lot of the Metal 4 API lives as new methods of Metal 3 Objects so maybe a mtl4 subfolder in lib/mtl would be more appropriate?

maleadt · 2025-06-24T06:40:14Z

lib/mtl/command_enc/compute4.jl

+# Copy Operations (Blit functionality integrated into compute encoder in Metal 4)
+# function append_copy!(cce::MTL4ComputeCommandEncoder, dst::MTLBuffer, dstOffset::Integer,
+#                       src::MTLBuffer, srcOffset::Integer, size::Integer)
+#     @objc [cce::id{MTL4ComputeCommandEncoder} copyFromBuffer:src::id{MTLBuffer}
+#                                              sourceOffset:srcOffset::NSUInteger
+#                                              toBuffer:dst::id{MTLBuffer}
+#                                              destinationOffset:dstOffset::NSUInteger
+#                                              size:size::NSUInteger]::Nothing
+# end
+
+# function append_copy!(cce::MTL4ComputeCommandEncoder, dst::MTLTexture, dstSlice::Integer, dstLevel::Integer, dstOrigin::MTLOrigin,
+#                       src::MTLBuffer, srcOffset::Integer, srcBytesPerRow::Integer, srcBytesPerImage::Integer,
+#                       size::MTLSize)
+#     @objc [cce::id{MTL4ComputeCommandEncoder} copyFromBuffer:src::id{MTLBuffer}
+#                                              sourceOffset:srcOffset::NSUInteger
+#                                              sourceBytesPerRow:srcBytesPerRow::NSUInteger
+#                                              sourceBytesPerImage:srcBytesPerImage::NSUInteger
+#                                              sourceSize:size::MTLSize
+#                                              toTexture:dst::id{MTLTexture}
+#                                              destinationSlice:dstSlice::NSUInteger
+#                                              destinationLevel:dstLevel::NSUInteger
+#                                              destinationOrigin:dstOrigin::MTLOrigin]::Nothing
+# end


So these can be removed?

No they exist with the new MTL4ComputerCommandEncoder. I just wanted to lay the groundwork for custom kernels and then one piece of functionality using Metal 4 (unsafe_fill! is what I landed on). This code hasn't been tested or anything but once synchronization is figured out I'll work on the rest of the Metal 4 MtlArray functionality.

maleadt · 2025-06-24T06:41:27Z

src/compiler/execution.jl


 @autoreleasepool function (kernel::HostKernel)(args...; groups=1, threads=1,
-                                               queue=global_queue(device()))
+                                               queue=use_metal4() ? global_queue4(device()) : global_queue(device()))


global_queue4 is weird, and I'd rather not have to do this dance everywhere. I'd rather we can use these regardless of the API version, so what about having global_queue(::MTLDevice) return an appropriate object given the selected API? Given that preferences are determined at compile time, that should also be type stable.

maleadt · 2025-06-24T06:51:41Z

src/compiler/execution.jl

+    if !use_mtl4
+        roots = [kernel.f, args]
+        MTL.on_completed(cmdbuf) do buf
+            empty!(roots)
+            foreach(free, argument_buffers)
+
+            # Check for errors
+            # XXX: we cannot do this nicely, e.g. throwing an `error` or reporting with `@error`
+            #      because we're not allowed to switch tasks from this contexts.
+            if buf.status == MTL.MTLCommandBufferStatusError
+                Core.println("ERROR: Failed to submit command buffer: $(buf.error.localizedDescription)")
+            end
        end
+    end


We'll need an equivalent for this, or Julia can early-free resources.

Related:

Unlike the default behavior of MTLCommandBuffer, you may need to consider a resource’s retain count because each MTL4CommandBuffer instance doesn’t create strong references to resources. This is similar to creating an MTLCommandBuffer with the makeCommandBufferWithUnretainedReferences() method of an MTLCommandQueue.

maleadt · 2025-06-24T06:52:34Z

src/compiler/execution.jl

-    cce = MTLComputeCommandEncoder(cmdbuf)
+    if use_mtl4
+        allocator = MTL4CommandAllocator(device())
+        cmdbuf = MTL4CommandBuffer(device())


You can reuse and repurpose each command buffer indefinitely by starting over, encoding new commands, and committing it again, instead of allocating a new buffer.

As a future TODO, we could probably use a task-local command buffer just like we have task-local queues now.

maleadt · 2025-06-24T06:53:22Z

src/initialization.jl

+    const use_metal4 = OncePerProcess{Bool}() do
+        dev = device()
+        force_metal3 = @load_preference("force_metal3", false)
+        return functional() && !force_metal3 && supports_family(dev, MTL.MTLGPUFamilyMetal4)
+    end


I'd make this api_version::Int then, parsing the api_version String preferences here.

maleadt · 2025-06-24T06:54:33Z

src/state.jl

 device!(dev::MTLDevice) = task_local_storage(:MTLDevice, dev)

 const global_queues = WeakKeyDict{MTLCommandQueue,Nothing}()
+const global_queues4 = WeakKeyDict{MTL4CommandQueue,Nothing}()


With the preference being compile-time determined, we don't need multiple of these.

If necessary (e.g. for type definitions or macro expansions), you can use @static if. Otherwise a plain top-level if api_version == ... should work.

maleadt reviewed Jun 24, 2025

View reviewed changes

christiangnrd force-pushed the mtl4 branch from 73e48f3 to 34e5dea Compare July 9, 2025 04:19

christiangnrd added 15 commits July 23, 2025 17:44

Update wrappers for Metal 4

820987c

Add basic Metal4 elements

4cd871e

Add preference to use Metal 3 even with a Metal 4 device

d07df7d

(Not working) Use Metal 4

4c25119

Testing

80d32ab

Update libmtl.jl for beta 3 changes

c0fef38

Rename for consistency

9df7891

Partial buffer feedback handling

6cf080f

uytf

0079d3a

More barrier functionality

faa176f

Whitespace

049e632

Progress? towards synchronization

6fca8e2

Oops

7cf55e6

More code

ffc5b41

More api

2529a91

christiangnrd force-pushed the mtl4 branch from 34e5dea to 2529a91 Compare July 23, 2025 20:44

christiangnrd mentioned this pull request Jul 29, 2025

Synchronize using MTLSharedEvents #633

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal 4#612

Metal 4#612
christiangnrd wants to merge 15 commits intomainfrom
mtl4

christiangnrd commented Jun 23, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

maleadt Jun 24, 2025

Uh oh!

christiangnrd Jun 24, 2025

Uh oh!

maleadt Jun 24, 2025

Uh oh!

christiangnrd Jun 24, 2025

Uh oh!

maleadt Jun 24, 2025

Uh oh!

christiangnrd Jun 24, 2025

Uh oh!

maleadt Jun 24, 2025

Uh oh!

maleadt Jun 24, 2025

Uh oh!

maleadt Jun 24, 2025

Uh oh!

maleadt Jun 24, 2025

Uh oh!

maleadt Jun 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

christiangnrd commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

christiangnrd commented Jun 23, 2025 •

edited

Loading

codecov bot commented Jun 23, 2025 •

edited

Loading