Skip to content

GC and multithreading. Strange behaviour. #54247

@sgaure

Description

@sgaure

While investigating some strange slowdowns I've disovered a behaviour of the GC I do not understand.

Here is a MWE. It's a bit complicated. I have 500000 short vectors which I take a large sample from (50 million, with replacement). Then there's an alternative where the individual vectors are replaced by a struct containing two references to the vector.

When I do an operation on the vector of vectors, it's all like I would expect. The GC is exercised, but nothing too extreme. When it's a vector of structs, and in addition the actual work is done in a single @spawn which is immediately fetched, I sometimes get up to 20 seconds of GC activity. This is on master. On 1.10.2 it's up to 50 seconds. With more vectors, the GC may run for several minutes.

I see no intuitive reason for this behaviour which only appears when a fetch(@spawn work(v)) is done, not with work(v). I have a 64GB 12 core AMD, with 24 virtual cores. If I run with -t 12 this strange behaviour does not seem happen. It does however happen with -t 24 --gcthreads=4, with GC time up to 40 seconds.

using Base.Threads: @spawn
using InteractiveUtils: versioninfo
using StatsBase: sample

struct Astruct
    v1::Vector{Float64}
    v2::Vector{Float64}
    Astruct(v) = new(v,v)
end
Base.getindex(as::Astruct, i) = as.v1[i]

function work(vvec)
    x = 0.0
    for v in vvec
        x += v[1]
    end
    return x
end


fun(v, par) = par ? fetch(@spawn work(v))::Float64 : work(v)

test(v, N) = @time "serial" fun(sample(v, N), false)

partest(v, N) = @time "parallel" fun(sample(v, N), true)


versioninfo()

const N = 500_000
const v = [rand(10) for _ in 1:N]
const as = Astruct.(v)


println("***** Vector *****")
GC.gc()
for _ in 1:10
    test(v, 100N)
end
GC.gc()
for _ in 1:10
    partest(v, 100N)
end

println("\n***** Astruct *****")
GC.gc()
for _ in 1:10
    test(as, 100N)
end
GC.gc()
for _ in 1:10
    partest(as, 100N)
end

Output:

Julia Version 1.12.0-DEV.383
Commit 903ffff167* (2024-04-22 04:40 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen Threadripper PRO 5945WX 12-Cores
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 24 default, 0 interactive, 24 GC (on 24 virtual cores)
Environment:
  JULIA_NUM_THREADS = auto
***** Vector *****
serial: 0.790318 seconds (3 allocations: 381.470 MiB, 0.55% gc time)
serial: 0.820720 seconds (3 allocations: 381.470 MiB, 4.01% gc time)
serial: 0.823082 seconds (3 allocations: 381.470 MiB, 3.90% gc time)
serial: 0.812152 seconds (3 allocations: 381.470 MiB, 3.73% gc time)
serial: 0.811470 seconds (3 allocations: 381.470 MiB, 3.76% gc time)
serial: 0.803246 seconds (3 allocations: 381.470 MiB, 3.78% gc time)
serial: 0.815217 seconds (3 allocations: 381.470 MiB, 3.75% gc time)
serial: 0.815175 seconds (3 allocations: 381.470 MiB, 3.79% gc time)
serial: 0.807833 seconds (3 allocations: 381.470 MiB, 3.76% gc time)
serial: 0.821069 seconds (3 allocations: 381.470 MiB, 3.67% gc time)
parallel: 0.847207 seconds (3.33 k allocations: 381.640 MiB, 0.28% gc time, 0.93% compilation time)
parallel: 0.850981 seconds (9 allocations: 381.470 MiB, 1.21% gc time)
parallel: 0.886037 seconds (9 allocations: 381.470 MiB, 12.49% gc time)
parallel: 0.776483 seconds (9 allocations: 381.470 MiB)
parallel: 0.962529 seconds (9 allocations: 381.470 MiB, 14.78% gc time)
parallel: 0.985055 seconds (9 allocations: 381.470 MiB, 19.63% gc time)
parallel: 0.831015 seconds (9 allocations: 381.470 MiB, 4.25% gc time)
parallel: 0.840257 seconds (9 allocations: 381.470 MiB, 4.67% gc time)
parallel: 0.838799 seconds (9 allocations: 381.470 MiB, 4.48% gc time)
parallel: 0.859712 seconds (9 allocations: 381.470 MiB, 4.76% gc time)

***** Astruct *****
serial: 0.947084 seconds (3 allocations: 762.940 MiB, 0.39% gc time)
serial: 0.990493 seconds (3 allocations: 762.940 MiB, 5.19% gc time)
serial: 0.978335 seconds (3 allocations: 762.940 MiB, 4.95% gc time)
serial: 0.976766 seconds (3 allocations: 762.940 MiB, 4.94% gc time)
serial: 0.975440 seconds (3 allocations: 762.940 MiB, 5.05% gc time)
serial: 0.977785 seconds (3 allocations: 762.940 MiB, 4.94% gc time)
serial: 0.980397 seconds (3 allocations: 762.940 MiB, 4.95% gc time)
serial: 0.976771 seconds (3 allocations: 762.940 MiB, 4.94% gc time)
serial: 0.982649 seconds (3 allocations: 762.940 MiB, 4.94% gc time)
serial: 0.981036 seconds (3 allocations: 762.940 MiB, 4.92% gc time)
parallel: 0.952661 seconds (3.42 k allocations: 763.113 MiB, 0.23% gc time, 0.77% compilation time)
parallel: 8.064466 seconds (9 allocations: 762.940 MiB, 88.19% gc time)
parallel: 0.940224 seconds (9 allocations: 762.940 MiB)
parallel: 14.162925 seconds (9 allocations: 762.940 MiB, 93.19% gc time)
parallel: 0.930567 seconds (9 allocations: 762.940 MiB)
parallel: 0.926054 seconds (9 allocations: 762.940 MiB)
parallel: 14.008024 seconds (9 allocations: 762.940 MiB, 93.24% gc time)
parallel: 0.942324 seconds (9 allocations: 762.940 MiB)
parallel: 0.938320 seconds (9 allocations: 762.940 MiB)
parallel: 20.819898 seconds (9 allocations: 762.940 MiB, 95.42% gc time)

Metadata

Metadata

Assignees

No one assigned

    Labels

    GCGarbage collectorperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions