Skip to content

Julia 1.12: Mismatched codeinfo hits assertion error. #754

@wsmoses

Description

@wsmoses

Specifically we hit the following error (which occurs only on 1.12, not 1.10 or 1.11):

ERROR: LoadError: AssertionError: Static compilation failed
Stacktrace:
  [1] compile_method_instance(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/j4HFa/src/jlgen.jl:848
  [2] irgen(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/j4HFa/src/irgen.jl:4
  [3] emit_llvm(job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/j4HFa/src/driver.jl:200
  [4] emit_llvm
    @ ~/.julia/packages/GPUCompiler/j4HFa/src/driver.jl:182 [inlined]

After deeper investigation what is happening is as follows. Our interpreter can dispatch certain methods to the native interpreter. As a consequence the methodinstance is not sufficient to correctly identify which method something is.

Within the requisite runtimecall used by gpucompiler here to find the llvm info:

79	extern "C" JL_DLLEXPORT_CODEGEN
80	void jl_get_function_id_impl(void *native_code, jl_code_instance_t *codeinst,
81	        int32_t *func_idx, int32_t *specfunc_idx)
82	{
83	    jl_native_code_desc_t *data = (jl_native_code_desc_t*)native_code;
84	    if (data) {
(rr) 
85	        // get the function index in the fvar lookup table
86	        auto it = data->jl_fvar_map.find(codeinst);
87	        if (it != data->jl_fvar_map.end()) {
88	            std::tie(*func_idx, *specfunc_idx) = it->second;
89	        }
90	    }
91	}
92	

the codeinstance provided is not found.

The corresopnding codeinstance we had was:

(rr) p codeinst
$15 = (jl_code_instance_t *) 0x7672abb5ce20
(rr) p jl_(codeinst->def)
(::Type{ArgumentError})(String) from (::Type{ArgumentError})(AbstractString)

However the contents of jl_fvar_map were:

(rr) p jl_((data->jl_fvar_map.begin())->first.def)
similar(Reactant.TracedRArray{Float64, 2}, Type{Reactant.TracedRNumber{Float64}}, Tuple{Int64, Int64}) from similar(Reactant.TracedRArray{T, N} where N where T, Type{T}, Tuple{Vararg{Int64, N}}) where {T, N}
(rr) p jl_((++data->jl_fvar_map.begin())->first.def)
kwcall(NamedTuple{(:location,), Tuple{Reactant.MLIR.IR.Location}}, typeof(Reactant.Ops.fill), Float64, Array{Int64, 1}) from kwcall(NamedTuple{names, T} where T<:Tuple where names, typeof(Reactant.Ops.fill), Float64, Array{Int64, 1})
$11 = void
(rr) p jl_((++++data->jl_fvar_map.begin())->first.def)
throw_boundserror(Array{Int64, 1}, Tuple{Int64}) from throw_boundserror(Any, Any)
$12 = void
(rr) p jl_((++++++data->jl_fvar_map.begin())->first.def)
(::Type{ArgumentError})(String) from (::Type{ArgumentError})(AbstractString)
$13 = void
(rr) p (++++++data->jl_fvar_map.begin())->first
$14 = (_jl_code_instance_t * const) 0x767402428b90 <jl_system_image_data+59594448>

So it contains the right methodinstance, but wrong codeinstance.

The code instances are found as follows:

    if VERSION >= v"1.13.0-DEV.1120"
        # on sufficiently recent versions of Julia, we can query the CIs compiled.
        # this is required after the move to `invoke(::CodeInstance)`, because our
        # lookup function (used to populate method_instances) isn't always called then.

        num_cis = Ref{Csize_t}(0)
        @ccall jl_get_llvm_cis(native_code::Ptr{Cvoid}, num_cis::Ptr{Csize_t},
                               C_NULL::Ptr{Cvoid})::Nothing
        resize!(method_instances, num_cis[])
        @ccall jl_get_llvm_cis(native_code::Ptr{Cvoid}, num_cis::Ptr{Csize_t},
                               method_instances::Ptr{Cvoid})::Nothing

        for (i, ci) in enumerate(method_instances)
            method_instances[i] = ci.def::MethodInstance 
        end                     
    
    elseif VERSION >= v"1.12.0-DEV.1703"
        # slightly older versions of Julia used MIs directly
    
        num_mis = Ref{Csize_t}(0)
        @ccall jl_get_llvm_mis(native_code::Ptr{Cvoid}, num_mis::Ptr{Csize_t},
                               C_NULL::Ptr{Cvoid})::Nothing
        resize!(method_instances, num_mis[])
        @ccall jl_get_llvm_mis(native_code::Ptr{Cvoid}, num_mis::Ptr{Csize_t},
                               method_instances::Ptr{Cvoid})::Nothing
    end

    # process all compiled method instances
    compiled = Dict()
    for mi in method_instances
        ci = ci_cache_lookup(cache, mi, job.world, job.world)
        ci === nothing && continue
        @show ci, ci.owner, mi, job.world
        # get the function index
        llvm_func_idx = Ref{Int32}(-1)
        llvm_specfunc_idx = Ref{Int32}(-1)
        ccall(:jl_get_function_id, Nothing,
              (Ptr{Cvoid}, Any, Ptr{Int32}, Ptr{Int32}),
              native_code, ci, llvm_func_idx, llvm_specfunc_idx)
        @assert llvm_func_idx[] != -1 || llvm_specfunc_idx[] != -1 "Static compilation failed"

Specifically on 1.12 we call jl_get_llvm_mis which internally performs a mapping, losing the notion of which codeinfo we care about, and ci_cache_lookup returns the wrong one. The jl_get_llvm_cis avoids this issue by directly preserving the right ci.

I've started a 1.12 backport PR: JuliaLang/julia#60725 to leverage the right function on 1.12.

However in the meantime for 1.12 I'm going to loosen the jl_get_function_id to continue if not found on 1.12 [just like if the ci === nothing].

cc @gbaraldi @vchuravy @glou-nes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingupstream

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions