Parallelize sampling for gen_samples and q22 with multiple repetitions#36
Parallelize sampling for gen_samples and q22 with multiple repetitions#36mawilson1234 wants to merge 4 commits intoquentingronau:masterfrom
Conversation
|
Hi Michael, Increasing the speed of bridgesampling by getting faster multivariate normal samples sounds like a good idea. Maybe an alternative approach that could work might be replacing the call to rmvnorm with a call to Might make sense to consider our options before committing to this specific addition. Cheers, |
|
Hi Henrik, I had tried using I also see that the merge checks are failing with the error: Not sure what is causing that. At any rate, it's not a big deal if you decide to take a different approach (or leave things as is, since they're working right now and I certainly wouldn't want to break anything accidentally). I had just been messing around with the code since I was trying to run several repetitions, and noticed that there might be some possible time savings in that area, so I figured I'd go ahead and put in the pull request. Thanks for your response! |
|
Some further testing on a smaller model showed that (for 4 repetitions on 4 cores) the current implementation took 1.15 minutes, replacing the calls to However, I was running into issues with errors about exporting the variables |
Sampling from a multivariate normal distribution can take a long time. When running bridge sampling with multiple repetitions, sampling to construct
gen_samplesandq22currently takes place serially, even if cores > 1. This pull request modifiesbridge_sampler_normal.Randbridge_sample_warp3.Rto sample for these in parallel when repetitions > 1 (if cores is also >1).Some preliminary testing indicated that this could potentially save a lot of time when running several repetitions. I used a brm fit with a fairly complex model fit using 20k samples/chain and 4 chains. Here's the model formula, in case it matters:
I measured how long it took to run 4 repetitions of the bridge sampler (
method = "normal") usingSys.time(). The current serial implementation took 33.16 minutes, while the parallelized version 14.36 minutes, a difference of 18.8 minutes. I don't have local resources available to run many more repetitions than that with a model of this size, but I imagine it might scale nicely, since there's a noticeable time savings even with only 4 repetitions.I will note that I ran this on Windows using
parallel::makeClusterrather thanparallel::mclapply, after bypassing the check against running multicore on Windows, so I'm not sure if that code works as is on Unix-based systems, nor do I have an estimate of the time savings on those systems—someone with a Unix-based system should do some testing if this change turns out to be of interest.In addition, I don't know whether there is some potential concern about RNG seeding across the cores that might compromise the results; if so, then just eating the time cost for running multiple repetitions may be the way to go. If not though, parallelizing these draws might result in significant time savings for people running multiple repetitions.