Enable Ahead-of-Time Compilation by hiding the runtime functions in the GLOBAL_METHOD_TABLE#749
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/src/runtime.jl b/src/runtime.jl
index 2f7312e..55affb1 100644
--- a/src/runtime.jl
+++ b/src/runtime.jl
@@ -83,10 +83,11 @@ function compile(def, return_type, types, llvm_return_type=nothing, llvm_types=n
# using the new nonrecursive codegen to handle function lookup ourselves?
if def isa Symbol
args = [gensym() for typ in types]
- @eval GPUCompiler.@device_function($return_type,
- @inline $def($(args...)) =
- ccall($("extern $llvm_name"), llvmcall, $return_type, ($(types...),), $(args...))
- )
+ @eval GPUCompiler.@device_function(
+ $return_type,
+ @inline $def($(args...)) =
+ ccall($("extern $llvm_name"), llvmcall, $return_type, ($(types...),), $(args...))
+ )
end
return
diff --git a/src/utils.jl b/src/utils.jl
index 8be6a4b..ae26a0c 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -255,12 +255,14 @@ macro device_function(rt, ex)
$rt(1)
end
- esc(quote
- $(combinedef(def))
+ return esc(
+ quote
+ $(combinedef(def))
- # NOTE: no use of `@consistent_overlay` here because the regular function errors
- Base.Experimental.@overlay($(GPUCompiler).GLOBAL_METHOD_TABLE, $ex)
- end)
+ # NOTE: no use of `@consistent_overlay` here because the regular function errors
+ Base.Experimental.@overlay($(GPUCompiler).GLOBAL_METHOD_TABLE, $ex)
+ end
+ )
end
diff --git a/test/utils.jl b/test/utils.jl
index 26c189e..8e8e1b8 100644
--- a/test/utils.jl
+++ b/test/utils.jl
@@ -202,17 +202,19 @@ end
# Create a test module to contain the device functions
test_mod = @eval module $(gensym("DeviceFunctionTest"))
- using GPUCompiler
-
- # Test with Ptr return type (common for runtime functions)
- GPUCompiler.@device_function(Ptr{Nothing},
- @inline test_device_ptr() = ccall("extern gpu_test", llvmcall, Ptr{Nothing}, ())
- )
-
- # Test with primitive return type
- GPUCompiler.@device_function(Nothing,
- @inline test_device_nothing() = ccall("extern gpu_test2", llvmcall, Nothing, ())
- )
+ using GPUCompiler
+
+ # Test with Ptr return type (common for runtime functions)
+ GPUCompiler.@device_function(
+ Ptr{Nothing},
+ @inline test_device_ptr() = ccall("extern gpu_test", llvmcall, Ptr{Nothing}, ())
+ )
+
+ # Test with primitive return type
+ GPUCompiler.@device_function(
+ Nothing,
+ @inline test_device_nothing() = ccall("extern gpu_test2", llvmcall, Nothing, ())
+ )
end
# Verify the functions are defined in the test module |
|
Loaded both forked CUDA.jl and this PR and tried to compile my full code and got error: Stacktrace is massive so I copied first several lines. Otherwise, I could compile GPUCompiler in image. Do you have idea where could this come from? |
|
@KSepetanc yep, I caught that in the tests for the PR as well (somehow they were passing for me locally, but I suspect that was just poor environment management on my part). Unsurprisingly, my hack seems to break things in I will turn this PR to a draft until then. |
|
@apozharski are you using CUDA 590 driver branch (it is CUDA 13.1)? I have seen maintainers are preparing support for it, but last I checked a few days ago it still was not released. Without knowing more about your system, I presume you just need to downgrade to 580 series driver that comes with CUDA 13.0. I had this issue too. I will soon have more questions as it seems that more fixes are needed than just GPUCompiler.jl and CUDA.jl to AOT compile MadNLPGPU which I need, but it is still WIP so I will wait a bit. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #749 +/- ##
==========================================
+ Coverage 74.79% 74.80% +0.01%
==========================================
Files 24 24
Lines 3773 3779 +6
==========================================
+ Hits 2822 2827 +5
- Misses 951 952 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Marking ready. I seem to have fixed the issues I can repro on my own machine (it turns out including |
To clarify this was causing a bunch of I did some more digging this morning and I still don't understand why this was happening but my debugging is fairly primitive. |
|
It works! Loaded both CUDA.jl PR and this one and full code case passes for me (tried using exe files and all good). @apozharski thank you!! |
|
Came by one hurdle after all. Compiling with JuliaC worked as stated above, but got error when compiling with PackageCompiler instead. @apozharski could you run your debug with PackageCompiler and your code and check if you are getting it too? PackageCompiler has three stages:
Most errors are of this type (small part of stack trace below). |
…g us out of some KernelAbstractions compilations in e.g OpenCL.jl
de4599b to
538e94a
Compare
michel2323
left a comment
There was a problem hiding this comment.
CUDA instructions still leak to CPU at the LLVM level, but this unblocks the immediate AOT use case.
|
I'm not familiar enough with this part of Julia/GPUCompiler to meaningfully review |
As discussed in JuliaGPU/CUDA.jl#2998 and #611 currently
GPUCompiler.jlleaks nonexistantgpu_*llvm functions into the cpu cache making ahead of time compilation impossible for any package that uses it.I am currently fixing this by moving these runtime methods into the method table defined in the GPUCompiler module and having the CPU versions throw errors as is done in
CUDA.jl. This feels like somewhat of a hack, however, it seems to work and without a better understanding of what this might break it seems to be the simplest solution.