-
Notifications
You must be signed in to change notification settings - Fork 92
Error in unserialize(node$con) : MultisessionFuture (future_lapply-4) failed to receive results from cluster RichSOCKnode #4 (PID 436932 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive. #684
Description
(Please use https://github.com/HenrikBengtsson/future/discussions for Q&A)
Hi, thanks for your great R package future which really convenient to run long-long task.
Describe the bug
A clear and concise description of what the bug is.
options(future.globals.onReference = "error")
[R]> future::plan("multisession", workers = 10L)
[R]> rrho_res <- biomisc::run_rrho(
bca_diff_res,
gender_diff_res,
stepsize = 100L
)
[R]> rrho_perm_res <- biomisc::rrho_correct_pval(
rrho_res,
method = "permutation", perm = 200L
)
Error in unserialize(node$con) :
MultisessionFuture (future_lapply-4) failed to receive results from cluster RichSOCKnode #4 (PID 436932 on localhost ‘localhost’). The reason reported was ‘error reading from connection’. Post-mortem diagnostic: No process exists with this PID, i.e. the localhost worker is no longer alive. The total size of the 14 globals exported is 15.76 MiB. The three largest globals are ‘rrho_obj’ (15.47 MiB of class ‘list’), ‘rrho_hyper_overlap’ (143.94 KiB of class ‘function’) and ‘progression’ (91.02 KiB of class ‘function’)
I begin to run this function with future::plan("multicore", workers = 10L), it also gave similar error infos as belows, so I tried above multisession as indicated in #474
options(future.globals.onReference = "error")
[R]> future::plan("multicore", workers = 10L)
[R]> rrho_res <- biomisc::run_rrho(
bca_diff_res,
gender_diff_res,
stepsize = 100L
)
[R]> rrho_perm_res <- biomisc::rrho_correct_pval(
rrho_res,
method = "permutation", perm = 200L
)
Error: Failed to retrieve the result of MulticoreFuture (future_lapply-1) from the forked worker (on localhost; PID 452943). Post-mortem diagnostic: No process exists with this PID, i.e. the forked localhost worker is no longer alive. The total size of the 14 globals exported is 15.76 MiB. The three largest globals are ‘rrho_obj’ (15.47 MiB of class ‘list’), ‘rrho_hyper_overlap’ (143.94 KiB of class ‘function’) and ‘progression’ (91.02 KiB of class ‘function’)
In addition: Warning message:
In mccollect(jobs = jobs, wait = TRUE) :
1 parallel job did not deliver a result
the rrho_correct_pval is a long function deposited in https://github.com/Yunuuuu/biomisc/blob/81948d2e5e2bab5a4cf76fd76e8ab4a096192efd/R/run_rrho.R#L787
I put the main future function here:
p <- progressr::progressor(steps = perm)
perm_hyper_metric <- future.apply::future_lapply(
seq_len(perm), function(i) {
hyper_res <- rrho_hyper_overlap(
names(rrho_obj$rrho_data$list1)[
sample.int(length(rrho_obj$rrho_data$list1), replace = FALSE)
],
names(rrho_obj$rrho_data$list2)[
sample.int(length(rrho_obj$rrho_data$list2), replace = FALSE)
],
stepsize = rrho_obj$stepsize,
.parallel = FALSE
)
p(message = sprintf("Permuatating %d times", i))
rrho_metrics(hyper_res, log_base = rrho_obj$log_base)
},
future.globals = TRUE,
future.seed = TRUE
)
Reproduce example
Actually, the biomisc::run_rrho also use future_lapply but it won't often gave a error (it'll also gave error randomly, I cannot reproduce biomisc::run_rrho error message, but I run biomsc::rrho_correct_pval after biomisc::run_rrho which will often reproduce above error (use my own data: gene expression array data)).
But when I use a artificial data (I cannot make error occur, so I can't give reboust example code to reproduce this error)
Expected behavior
Run without error
Session information
Please share your session information after the error has occurred so that we also see which packages and versions are involved;
> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so; LAPACK version 3.8.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
time zone: Asia/Shanghai
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.3.0 parallelly_1.35.0 cli_3.6.1
[4] tools_4.3.0 parallel_4.3.0 future.apply_1.10.0
[7] listenv_0.9.0 Rcpp_1.0.10 codetools_0.2-19
[10] progressr_0.13.0-9002 data.table_1.14.9 biomisc_0.0.0.9000
[13] jsonlite_1.8.4 digest_0.6.31 globals_0.16.2
[16] rlang_1.1.1 future_1.32.0
…
…
> future::futureSessionInfo()
*** Package versions
future 1.32.0, parallelly 1.35.0, parallel 4.3.0, globals 0.16.2, listenv 0.9.0
*** Allocations
availableCores():
system nproc
24 24
availableWorkers():
$nproc
[1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[7] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[13] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[19] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
$system
[1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[7] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[13] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[19] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
*** Settings
- future.plan=<not set>
- future.fork.multithreading.enable=<not set>
- future.globals.maxSize=<not set>
- future.globals.onReference=‘error’
- future.resolve.recursive=<not set>
- future.rng.onMisuse=<not set>
- future.wait.timeout=<not set>
- future.wait.interval=<not set>
- future.wait.alpha=<not set>
- future.startup.script=<not set>
*** Backends
Number of workers: 10
List of future strategies:
1. multicore:
- args: function (..., workers = 10L, envir = parent.frame())
- tweaked: TRUE
- call: future::plan("multicore", workers = 10L)
*** Basic tests
Main R session details:
pid r sysname release
1 455700 4.3.0 Linux 5.19.0-41-generic
version nodename
1 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
machine login user effective_user
1 x86_64 user001 user001 user001
Worker R session details:
worker pid r sysname release
1 1 472740 4.3.0 Linux 5.19.0-41-generic
2 2 472741 4.3.0 Linux 5.19.0-41-generic
3 3 472742 4.3.0 Linux 5.19.0-41-generic
4 4 472743 4.3.0 Linux 5.19.0-41-generic
5 5 472744 4.3.0 Linux 5.19.0-41-generic
6 6 472745 4.3.0 Linux 5.19.0-41-generic
7 7 472746 4.3.0 Linux 5.19.0-41-generic
8 8 472747 4.3.0 Linux 5.19.0-41-generic
9 9 472748 4.3.0 Linux 5.19.0-41-generic
10 10 472749 4.3.0 Linux 5.19.0-41-generic
version nodename
1 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
2 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
3 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
4 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
5 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
6 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
7 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
8 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
9 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
10 #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 host001
machine login user effective_user
1 x86_64 user001 user001 user001
2 x86_64 user001 user001 user001
3 x86_64 user001 user001 user001
4 x86_64 user001 user001 user001
5 x86_64 user001 user001 user001
6 x86_64 user001 user001 user001
7 x86_64 user001 user001 user001
8 x86_64 user001 user001 user001
9 x86_64 user001 user001 user001
10 x86_64 user001 user001 user001
Number of unique worker PIDs: 10 (as expected)