Skip to content

AMDGPU does not load anymore without error when installation not functional #882

@omlins

Description

@omlins

It has been found a very useful property of GPU packages over the years to load without error when the installation not functional. Now there is thrown the following non-fatal error when no installation is present:

[8](https://github.com/omlins/ParallelStencil.jl/actions/runs/21431112594/job/61710441235#step:6:431)
┌ AMDGPU  AMDGPUEnzymeCoreExt
│  ┌ Error: ROCm discovery failed!
│  │ Discovered ROCm path: .
│  │ Use `ROCM_PATH` env variable to specify ROCm directory.
│  │ 
│  │   exception =
│  │    AssertionError: isdir(libdir)
│  │    Stacktrace:
│  │      [1] find_rocm_library(lib::String; rocm_path::String, ext::String)
│  │        @ AMDGPU.ROCmDiscovery ~/.julia/packages/AMDGPU/IDGfT/src/discovery/utils.jl:113
│  │      [2] find_rocm_library
│  │        @ ~/.julia/packages/AMDGPU/IDGfT/src/discovery/utils.jl:111 [inlined]
│  │      [3] __init__()
│  │        @ AMDGPU.ROCmDiscovery ~/.julia/packages/AMDGPU/IDGfT/src/discovery/discovery.jl:87
│  │      [4] run_module_init(mod::Module, i::Int64)
│  │        @ Base ./loading.jl:1443
│  │      [5] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
│  │        @ Base ./loading.jl:1431
│  │      [6] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any}; register::Bool)
│  │        @ Base ./loading.jl:1319
│  │      [7] _include_from_serialized
│  │        @ ./loading.jl:1274 [inlined]
│  │      [8] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128, stalecheck::Bool; reasons::Dict{String, Int64}, DEPOT_PATH::Vector{String})
│  │        @ Base ./loading.jl:2115
│  │      [9] _require_search_from_serialized
│  │        @ ./loading.jl:2009 [inlined]
│  │     [10] __require_prelocked(pkg::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:2627
│  │     [11] _require_prelocked(uuidkey::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:2493
│  │     [12] macro expansion
│  │        @ ./loading.jl:2421 [inlined]
│  │     [13] macro expansion
│  │        @ ./lock.jl:376 [inlined]
│  │     [14] __require(into::Module, mod::Symbol)
│  │        @ Base ./loading.jl:2386
│  │     [15] require(into::Module, mod::Symbol)
│  │        @ Base ./loading.jl:2362
│  │     [16] top-level scope
│  │        @ ~/.julia/packages/AMDGPU/IDGfT/ext/AMDGPUEnzymeCoreExt/AMDGPUEnzymeCoreExt.jl:3
│  │     [17] include(mod::Module, _path::String)
│  │        @ Base ./Base.jl:306
│  │     [18] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
│  │        @ Base ./loading.jl:3024
│  │     [19] top-level scope
│  │        @ stdin:5
│  │     [20] eval(m::Module, e::Any)
│  │        @ Core ./boot.jl:489
│  │     [21] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
│  │        @ Base ./loading.jl:2870
│  │     [22] include_string
│  │        @ ./loading.jl:2880 [inlined]
│  │     [23] exec_options(opts::Base.JLOptions)
│  │        @ Base ./client.jl:315
│  │     [24] _start()
│  │        @ Base ./client.jl:550
│  └ @ AMDGPU.ROCmDiscovery ~/.julia/packages/AMDGPU/IDGfT/src/discovery/discovery.jl:101
│  ┌ Warning: Device libraries are unavailable, device intrinsics will be disabled.
│  └ @ AMDGPU ~/.julia/packages/AMDGPU/IDGfT/src/AMDGPU.jl:190
│  ┌ Warning: HIP library is unavailable, HIP integration will be disabled.
│  └ @ AMDGPU ~/.julia/packages/AMDGPU/IDGfT/src/AMDGPU.jl:200
│  ┌ Warning: rocBLAS is unavailable, functionality will be disabled.
│  └ @ AMDGPU ~/.julia/packages/AMDGPU/IDGfT/src/AMDGPU.jl:211
│  ┌ Warning: rocSPARSE is unavailable, functionality will be disabled.
│  └ @ AMDGPU ~/.julia/packages/AMDGPU/IDGfT/src/AMDGPU.jl:211
│  ┌ Warning: rocSOLVER is unavailable, functionality will be disabled.
│  └ @ AMDGPU ~/.julia/packages/AMDGPU/IDGfT/src/AMDGPU.jl:211
│  ┌ Warning: rocRAND is unavailable, functionality will be disabled.
│  └ @ AMDGPU ~/.julia/packages/AMDGPU/IDGfT/src/AMDGPU.jl:211
│  ┌ Warning: rocFFT is unavailable, functionality will be disabled.
│  └ @ AMDGPU ~/.julia/packages/AMDGPU/IDGfT/src/AMDGPU.jl:211
│  ┌ Warning: MIOpen is unavailable, functionality will be disabled.
│  └ @ AMDGPU ~/.julia/packages/AMDGPU/IDGfT/src/AMDGPU.jl:211

A similar issue had already been reported and fixed a while ago, see #685.

It can be observed for example here in the ParallelStencil CI:
https://github.com/omlins/ParallelStencil.jl/actions/runs/21431112594/job/61710441241#step:6:348

It would be nice if this would just give a warning message without stack trace, because it adds an enormous amount of lines for no reason to our CI

Thanks!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions