Enable Ahead-of-Time Compilation by hiding the runtime functions in the `GLOBAL_METHOD_TABLE` by apozharski · Pull Request #749 · JuliaGPU/GPUCompiler.jl

apozharski · 2025-12-22T15:34:00Z

As discussed in JuliaGPU/CUDA.jl#2998 and #611 currently GPUCompiler.jl leaks nonexistant gpu_* llvm functions into the cpu cache making ahead of time compilation impossible for any package that uses it.

I am currently fixing this by moving these runtime methods into the method table defined in the GPUCompiler module and having the CPU versions throw errors as is done in CUDA.jl. This feels like somewhat of a hack, however, it seems to work and without a better understanding of what this might break it seems to be the simplest solution.

github-actions · 2025-12-22T15:34:59Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.

diff --git a/src/runtime.jl b/src/runtime.jl
index 2f7312e..55affb1 100644
--- a/src/runtime.jl
+++ b/src/runtime.jl
@@ -83,10 +83,11 @@ function compile(def, return_type, types, llvm_return_type=nothing, llvm_types=n
     #        using the new nonrecursive codegen to handle function lookup ourselves?
     if def isa Symbol
         args = [gensym() for typ in types]
-        @eval GPUCompiler.@device_function($return_type,
-                                           @inline $def($(args...)) =
-                                               ccall($("extern $llvm_name"), llvmcall, $return_type, ($(types...),), $(args...))
-                                           )
+        @eval GPUCompiler.@device_function(
+            $return_type,
+            @inline $def($(args...)) =
+                ccall($("extern $llvm_name"), llvmcall, $return_type, ($(types...),), $(args...))
+        )
     end
 
     return
diff --git a/src/utils.jl b/src/utils.jl
index 8be6a4b..ae26a0c 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -255,12 +255,14 @@ macro device_function(rt, ex)
         $rt(1)
     end
 
-    esc(quote
-        $(combinedef(def))
+    return esc(
+        quote
+            $(combinedef(def))
 
-        # NOTE: no use of `@consistent_overlay` here because the regular function errors
-        Base.Experimental.@overlay($(GPUCompiler).GLOBAL_METHOD_TABLE, $ex)
-    end)
+            # NOTE: no use of `@consistent_overlay` here because the regular function errors
+            Base.Experimental.@overlay($(GPUCompiler).GLOBAL_METHOD_TABLE, $ex)
+        end
+    )
 end
 
 
diff --git a/test/utils.jl b/test/utils.jl
index 26c189e..8e8e1b8 100644
--- a/test/utils.jl
+++ b/test/utils.jl
@@ -202,17 +202,19 @@ end
 
     # Create a test module to contain the device functions
     test_mod = @eval module $(gensym("DeviceFunctionTest"))
-        using GPUCompiler
-
-        # Test with Ptr return type (common for runtime functions)
-        GPUCompiler.@device_function(Ptr{Nothing},
-            @inline test_device_ptr() = ccall("extern gpu_test", llvmcall, Ptr{Nothing}, ())
-        )
-
-        # Test with primitive return type
-        GPUCompiler.@device_function(Nothing,
-            @inline test_device_nothing() = ccall("extern gpu_test2", llvmcall, Nothing, ())
-        )
+    using GPUCompiler
+
+    # Test with Ptr return type (common for runtime functions)
+    GPUCompiler.@device_function(
+        Ptr{Nothing},
+        @inline test_device_ptr() = ccall("extern gpu_test", llvmcall, Ptr{Nothing}, ())
+    )
+
+    # Test with primitive return type
+    GPUCompiler.@device_function(
+        Nothing,
+        @inline test_device_nothing() = ccall("extern gpu_test2", llvmcall, Nothing, ())
+    )
     end
 
     # Verify the functions are defined in the test module

KSepetanc · 2025-12-23T00:11:04Z

Loaded both forked CUDA.jl and this PR and tried to compile my full code and got error:

Stacktrace is massive so I copied first several lines.

ERROR: LoadError: Invalid return type for runtime function 'box_bool': expected LLVM.PointerType(ptr addrspace(10)), got LLVM.VoidType(void)
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:44
  [2] emit_function!(mod::LLVM.Module, config::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, f::Type, method::GPUCompiler.Runtime.RuntimeMethodInstance)
    @ GPUCompiler C:\Users\karlo\.julia\packages\GPUCompiler\vRm9U\src\rtlib.jl:81
  [3] build_runtime(job::GPUCompiler.CompilerJob)
    @ GPUCompiler C:\Users\karlo\.julia\packages\GPUCompiler\vRm9U\src\rtlib.jl:117
  [4] (::GPUCompiler.var"#load_runtime##0#load_runtime##1"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})()
    @ GPUCompiler C:\Users\karlo\.julia\packages\GPUCompiler\vRm9U\src\rtlib.jl:159
  [5] lock(f::GPUCompiler.var"#load_runtime##0#load_runtime##1"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}, l::ReentrantLock)
    @ Base .\lock.jl:335

Otherwise, I could compile GPUCompiler in image.

Do you have idea where could this come from?

apozharski · 2025-12-23T11:49:19Z

@KSepetanc yep, I caught that in the tests for the PR as well (somehow they were passing for me locally, but I suspect that was just poor environment management on my part).

Unsurprisingly, my hack seems to break things in GPUCompiler at runtime. I have some ideas, namely the fact that i am replacing the stub llvm call with an exception in the cpu-cache which simply returns void. It has taken me a bit to get something I can test with (since my machine at home has a quite broken CUDA installation 😅 ) but it seems I am able to test with OpenCL so hopefully I can get something a bit less hacky working soon depending on how much time I will have over the holidays.

I will turn this PR to a draft until then.

KSepetanc · 2025-12-23T20:21:41Z

@apozharski are you using CUDA 590 driver branch (it is CUDA 13.1)? I have seen maintainers are preparing support for it, but last I checked a few days ago it still was not released. Without knowing more about your system, I presume you just need to downgrade to 580 series driver that comes with CUDA 13.0. I had this issue too.

I will soon have more questions as it seems that more fixes are needed than just GPUCompiler.jl and CUDA.jl to AOT compile MadNLPGPU which I need, but it is still WIP so I will wait a bit.

codecov · 2025-12-31T09:00:17Z

Codecov Report

❌ Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.80%. Comparing base (d762273) to head (0c60f0c).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/runtime.jl	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #749      +/-   ##
==========================================
+ Coverage   74.79%   74.80%   +0.01%     
==========================================
  Files          24       24              
  Lines        3773     3779       +6     
==========================================
+ Hits         2822     2827       +5     
- Misses        951      952       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

apozharski · 2026-01-04T15:01:20Z

Marking ready. I seem to have fixed the issues I can repro on my own machine (it turns out including @warn in the cpu cache was breaking some things, though in truth I do not understand why)

apozharski · 2026-01-05T09:33:01Z

(it turns out including @warn in the cpu cache was breaking some things, though in truth I do not understand why)

To clarify this was causing a bunch of ccalls to cpu (julia) functions to be emitted in the IR for the compile job, and digging through it pointed to it being all the necessary components for the warn to be executed, even though the gpu_* runtime method was being correctly pulled only from the GLOBAL_METHOD_TABLE overlay.

I did some more digging this morning and I still don't understand why this was happening but my debugging is fairly primitive.

KSepetanc · 2026-01-05T23:32:05Z

It works! Loaded both CUDA.jl PR and this one and full code case passes for me (tried using exe files and all good). @apozharski thank you!!

KSepetanc · 2026-01-06T00:22:32Z

Came by one hurdle after all. Compiling with JuliaC worked as stated above, but got error when compiling with PackageCompiler instead. @apozharski could you run your debug with PackageCompiler and your code and check if you are getting it too?

PackageCompiler has three stages:

creating compiler sysimage (completes)
compiling fresh sysimage (completes)
fails at start here during precompile

Most errors are of this type (small part of stack trace below).

ERROR: LoadError: Failed to precompile GPUCompiler [61eb1bfa-7361-4325-ad38-22787b887f55] to "C:\\Users\\karlo\\.julia\\compiled\\v1.12\\GPUCompiler\\jl_11DD.tmp".
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:44
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, keep_loaded_modules::Bool; flags::Cmd, cacheflags::Base.CacheFlags, reasons::Dict{String, Int64}, loadable_exts::Nothing)
    @ Base .\loading.jl:3311
  [3] (::Base.var"#__require_prelocked##0#__require_prelocked##1"{Base.PkgId, String, Dict{String, Int64}})()
    @ Base .\loading.jl:2679
  [4] maybe_cachefile_lock(f::Base.var"#__require_prelocked##0#__require_prelocked##1"{Base.PkgId, String, Dict{String, Int64}}, pkg::Base.PkgId, srcpath::String; stale_age::Int64)
    @ Base .\loading.jl:3898

…erlay

…g us out of some KernelAbstractions compilations in e.g OpenCL.jl

michel2323

CUDA instructions still leak to CPU at the LLVM level, but this unblocks the immediate AOT use case.

christiangnrd · 2026-01-30T17:28:53Z

I'm not familiar enough with this part of Julia/GPUCompiler to meaningfully review

apozharski marked this pull request as draft December 23, 2025 11:49

apozharski marked this pull request as ready for review January 4, 2026 14:12

apozharski added 7 commits January 30, 2026 09:12

A, perhaps hacky, hiding of the runtime in the GLOBAL_METHOD_TABLE ov…

dafd637

…erlay

cleanup some debugging

5f4ac40

fix issue caused by moving device_function

9412c78

dummy CPU functions now seem to get us further but check_ir is kickin…

6bd1ece

…g us out of some KernelAbstractions compilations in e.g OpenCL.jl

@warn causing wierd things to be included

cf1b442

remove debugging

5251814

remove more debugging

538e94a

apozharski force-pushed the ap/aot-compilation branch from de4599b to 538e94a Compare January 30, 2026 08:12

michel2323 self-requested a review January 30, 2026 16:09

michel2323 added 2 commits January 30, 2026 10:29

Add @device_function test

7502a84

Add comment

0c60f0c

michel2323 approved these changes Jan 30, 2026

View reviewed changes

michel2323 requested review from christiangnrd and maleadt January 30, 2026 16:32

christiangnrd removed their request for review January 30, 2026 17:28

michel2323 merged commit 1976f0d into JuliaGPU:master Feb 4, 2026
33 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Ahead-of-Time Compilation by hiding the runtime functions in the `GLOBAL_METHOD_TABLE`#749

Enable Ahead-of-Time Compilation by hiding the runtime functions in the `GLOBAL_METHOD_TABLE`#749
michel2323 merged 9 commits intoJuliaGPU:masterfrom
apozharski:ap/aot-compilation

apozharski commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025 •

edited

Loading

Uh oh!

KSepetanc commented Dec 23, 2025 •

edited

Loading

Uh oh!

apozharski commented Dec 23, 2025

Uh oh!

KSepetanc commented Dec 23, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 31, 2025 •

edited

Loading

Uh oh!

apozharski commented Jan 4, 2026

Uh oh!

apozharski commented Jan 5, 2026 •

edited

Loading

Uh oh!

KSepetanc commented Jan 5, 2026

Uh oh!

KSepetanc commented Jan 6, 2026

Uh oh!

michel2323 left a comment

Uh oh!

christiangnrd commented Jan 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

apozharski commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KSepetanc commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apozharski commented Dec 23, 2025

Uh oh!

KSepetanc commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

apozharski commented Jan 4, 2026

Uh oh!

apozharski commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KSepetanc commented Jan 5, 2026

Uh oh!

KSepetanc commented Jan 6, 2026

Uh oh!

michel2323 left a comment

Choose a reason for hiding this comment

Uh oh!

christiangnrd commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 22, 2025 •

edited

Loading

KSepetanc commented Dec 23, 2025 •

edited

Loading

KSepetanc commented Dec 23, 2025 •

edited

Loading

codecov bot commented Dec 31, 2025 •

edited

Loading

apozharski commented Jan 5, 2026 •

edited

Loading

christiangnrd commented Jan 30, 2026 •

edited

Loading