[Do not merge] Switch to GPUArrays.jl reduction implementation by christiangnrd · Pull Request #628 · JuliaGPU/Metal.jl

christiangnrd · 2025-07-20T19:32:35Z

Don't remove the file yet to avoid merge conflict with #627

github-actions · 2025-07-20T19:33:03Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.

diff --git a/perf/runbenchmarks.jl b/perf/runbenchmarks.jl
index ba5e0d40..1d7901c5 100644
--- a/perf/runbenchmarks.jl
+++ b/perf/runbenchmarks.jl
@@ -1,6 +1,6 @@
 # benchmark suite execution and codespeed submission
 using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="akreduce")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "akreduce")
 
 using Metal
 
diff --git a/test/runtests.jl b/test/runtests.jl
index 4ee51134..fb376e4f 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -6,7 +6,7 @@ import REPL
 using Test
 
 using Pkg
-Pkg.add(url="https://github.com/christiangnrd/GPUArrays.jl", rev="akreduce")
+Pkg.add(url = "https://github.com/christiangnrd/GPUArrays.jl", rev = "akreduce")
 
 # Quit without erroring if Metal loaded without issues on unsupported platforms
 if !Sys.isapple()

christiangnrd · 2025-07-20T19:43:19Z

Leaving the current mapreducedim! implementation present, we can transition in two parts. First once AK supports broadcasted reductions, and then remove implementations from this repo after AK supports >1 input dims.

codecov · 2025-07-21T06:21:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.63%. Comparing base (1942968) to head (c0eddd1).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #628   +/-   ##
=======================================
  Coverage   80.63%   80.63%           
=======================================
  Files          61       61           
  Lines        2722     2722           
=======================================
  Hits         2195     2195           
  Misses        527      527

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions

Metal Benchmarks

Details

Benchmark suite	Current: `c0eddd1`	Previous: `1942968`	Ratio
`latency/precompile`	`9830015416` ns	`9844653958` ns	`1.00`
`latency/ttfp`	`3989128875` ns	`3972040229` ns	`1.00`
`latency/import`	`1281988208` ns	`1275530958.5` ns	`1.01`
`integration/metaldevrt`	`830312.5` ns	`828500` ns	`1.00`
`integration/byval/slices=1`	`1532291.5` ns	`1536750` ns	`1.00`
`integration/byval/slices=3`	`8864917` ns	`9632625` ns	`0.92`
`integration/byval/reference`	`1535333` ns	`1543583` ns	`0.99`
`integration/byval/slices=2`	`2554083` ns	`2621958.5` ns	`0.97`
`kernel/indexing`	`582792` ns	`567792` ns	`1.03`
`kernel/indexing_checked`	`577208` ns	`569292` ns	`1.01`
`kernel/launch`	`9042` ns	`9208` ns	`0.98`
`array/construct`	`6125` ns	`6625` ns	`0.92`
`array/broadcast`	`579250` ns	`583375` ns	`0.99`
`array/random/randn/Float32`	`821167` ns	`784333` ns	`1.05`
`array/random/randn!/Float32`	`622625` ns	`623250` ns	`1.00`
`array/random/rand!/Int64`	`555395.5` ns	`547458` ns	`1.01`
`array/random/rand!/Float32`	`584125` ns	`585291` ns	`1.00`
`array/random/rand/Int64`	`777375` ns	`771250` ns	`1.01`
`array/random/rand/Float32`	`628375` ns	`622687` ns	`1.01`
`array/accumulate/Int64/1d`	`1261292` ns	`1277104.5` ns	`0.99`
`array/accumulate/Int64/dims=1`	`1800500` ns	`1868333` ns	`0.96`
`array/accumulate/Int64/dims=2`	`2165958.5` ns	`2183625` ns	`0.99`
`array/accumulate/Int64/dims=1L`	`11643104` ns	`11737104` ns	`0.99`
`array/accumulate/Int64/dims=2L`	`9718917` ns	`9771416.5` ns	`0.99`
`array/accumulate/Float32/1d`	`1141375` ns	`1142833` ns	`1.00`
`array/accumulate/Float32/dims=1`	`1562333.5` ns	`1570458` ns	`0.99`
`array/accumulate/Float32/dims=2`	`1865875` ns	`1931625` ns	`0.97`
`array/accumulate/Float32/dims=1L`	`9890916.5` ns	`9864375` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`7298500` ns	`7308021` ns	`1.00`
`array/reductions/reduce/Int64/1d`	`1077583` ns	`1373353.5` ns	`0.78`
`array/reductions/reduce/Int64/dims=1`	`987500` ns	`1069291.5` ns	`0.92`
`array/reductions/reduce/Int64/dims=2`	`935145.5` ns	`1193292` ns	`0.78`
`array/reductions/reduce/Int64/dims=1L`	`2350750` ns	`2113062.5` ns	`1.11`
`array/reductions/reduce/Int64/dims=2L`	`2815291` ns	`3456458` ns	`0.81`
`array/reductions/reduce/Float32/1d`	`1029750` ns	`971625` ns	`1.06`
`array/reductions/reduce/Float32/dims=1`	`956125` ns	`808458` ns	`1.18`
`array/reductions/reduce/Float32/dims=2`	`870375` ns	`768979` ns	`1.13`
`array/reductions/reduce/Float32/dims=1L`	`1659354.5` ns	`1739041` ns	`0.95`
`array/reductions/reduce/Float32/dims=2L`	`2781167` ns	`1772125` ns	`1.57`
`array/reductions/mapreduce/Int64/1d`	`1000375` ns	`1456146` ns	`0.69`
`array/reductions/mapreduce/Int64/dims=1`	`936083` ns	`1074875` ns	`0.87`
`array/reductions/mapreduce/Int64/dims=2`	`873500` ns	`1206417` ns	`0.72`
`array/reductions/mapreduce/Int64/dims=1L`	`2346562.5` ns	`2119292` ns	`1.11`
`array/reductions/mapreduce/Int64/dims=2L`	`2844729` ns	`3444375` ns	`0.83`
`array/reductions/mapreduce/Float32/1d`	`1045959` ns	`990792` ns	`1.06`
`array/reductions/mapreduce/Float32/dims=1`	`947959` ns	`810062.5` ns	`1.17`
`array/reductions/mapreduce/Float32/dims=2`	`868041.5` ns	`761104` ns	`1.14`
`array/reductions/mapreduce/Float32/dims=1L`	`1668167` ns	`1740812.5` ns	`0.96`
`array/reductions/mapreduce/Float32/dims=2L`	`2815354.5` ns	`1781292` ns	`1.58`
`array/private/copyto!/gpu_to_gpu`	`636791` ns	`651375` ns	`0.98`
`array/private/copyto!/cpu_to_gpu`	`795791` ns	`805542` ns	`0.99`
`array/private/copyto!/gpu_to_cpu`	`811292` ns	`817667` ns	`0.99`
`array/private/iteration/findall/int`	`1657000` ns	`1646500` ns	`1.01`
`array/private/iteration/findall/bool`	`1451937.5` ns	`1444584` ns	`1.01`
`array/private/iteration/findfirst/int`	`2074750` ns	`1754958.5` ns	`1.18`
`array/private/iteration/findfirst/bool`	`1635145.5` ns	`1703625` ns	`0.96`
`array/private/iteration/scalar`	`5542583.5` ns	`4772500` ns	`1.16`
`array/private/iteration/logical`	`2734958` ns	`2536917` ns	`1.08`
`array/private/iteration/findmin/1d`	`1870167` ns	`1815666` ns	`1.03`
`array/private/iteration/findmin/2d`	`1891583.5` ns	`1431750` ns	`1.32`
`array/private/copy`	`573791.5` ns	`538167` ns	`1.07`
`array/shared/copyto!/gpu_to_gpu`	`83750` ns	`86375` ns	`0.97`
`array/shared/copyto!/cpu_to_gpu`	`82625` ns	`86583` ns	`0.95`
`array/shared/copyto!/gpu_to_cpu`	`91458` ns	`84833` ns	`1.08`
`array/shared/iteration/findall/int`	`1643437.5` ns	`1609874.5` ns	`1.02`
`array/shared/iteration/findall/bool`	`1471812.5` ns	`1464354` ns	`1.01`
`array/shared/iteration/findfirst/int`	`1830375` ns	`1377750` ns	`1.33`
`array/shared/iteration/findfirst/bool`	`1385917` ns	`1319166` ns	`1.05`
`array/shared/iteration/scalar`	`206917` ns	`217500` ns	`0.95`
`array/shared/iteration/logical`	`2750042` ns	`2288708.5` ns	`1.20`
`array/shared/iteration/findmin/1d`	`1607895.5` ns	`1421750` ns	`1.13`
`array/shared/iteration/findmin/2d`	`1917291.5` ns	`1430854.5` ns	`1.34`
`array/shared/copy`	`251042` ns	`248666` ns	`1.01`
`array/permutedims/4d`	`2442208` ns	`2438438` ns	`1.00`
`array/permutedims/2d`	`1184291.5` ns	`1193250` ns	`0.99`
`array/permutedims/3d`	`1737625` ns	`1768458` ns	`0.98`
`metal/synchronization/stream`	`19667` ns	`19916` ns	`0.99`
`metal/synchronization/context`	`20292` ns	`20375` ns	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt · 2025-07-29T16:17:49Z

Leaving the current mapreducedim! implementation present, we can transition in two parts. First once AK supports broadcasted reductions, and then remove implementations from this repo after AK supports >1 input dims.

I think I'd rather we do it in one pass, because the change needs to be made across back-ends.

maleadt · 2025-07-30T06:52:14Z

In any case, despite some regressions the overall performance seems better here than over in CUDA.jl.

github-actions bot reviewed Jul 21, 2025

View reviewed changes

christiangnrd force-pushed the noreduce branch from b7b4e5b to 4359b51 Compare July 21, 2025 19:44

christiangnrd changed the title ~~Switch to GPUArrays.jl reduction implementation~~ [Do not merge] Switch to GPUArrays.jl reduction implementation Jul 23, 2025

christiangnrd force-pushed the noreduce branch from 2654538 to 3f5cb6d Compare July 29, 2025 23:02

Test/bench AK mapreduce integration

c0eddd1

christiangnrd force-pushed the noreduce branch from 3f5cb6d to c0eddd1 Compare August 1, 2025 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Do not merge] Switch to GPUArrays.jl reduction implementation#628

[Do not merge] Switch to GPUArrays.jl reduction implementation#628
christiangnrd wants to merge 1 commit intomainfrom
noreduce

christiangnrd commented Jul 20, 2025

Uh oh!

github-actions bot commented Jul 20, 2025 •

edited

Loading

Uh oh!

christiangnrd commented Jul 20, 2025

Uh oh!

codecov bot commented Jul 21, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment •

edited

Loading

Uh oh!

maleadt commented Jul 29, 2025

Uh oh!

maleadt commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

christiangnrd commented Jul 20, 2025

Uh oh!

github-actions bot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christiangnrd commented Jul 20, 2025

Uh oh!

codecov bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Metal Benchmarks

Uh oh!

maleadt commented Jul 29, 2025

Uh oh!

maleadt commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Jul 20, 2025 •

edited

Loading

codecov bot commented Jul 21, 2025 •

edited

Loading

github-actions bot left a comment •

edited

Loading