Skip to content

Using argument nr_cpus for parallelization might lead to long run time #27

@JohannesGuss

Description

@JohannesGuss

Running the following code lead to long run times or no termination at all

###############################
library(data.table)
devtools::install_github("statistikat/simPop")
library(simPop)
# examples for xgboost

## load the demo data set
data(eusilcS)

## create the structure
inp <- specifyInput(data = eusilcS,
          hhid = "db030", # variable with cluster information
          strata = "db040",
          weight = "db090" # variable with sampling weights
)
simPop <- simStructure(data=inp,
            method="direct",
            basicHHvars=c("age", "rb090"))
simPop


model_params <- list(max.depth = 10, eta = 0.5, nrounds = 5, objective = "multi:softprob")
simPop <- simCategorical(simPop,
             additional = c("pl030", "pb220a"),
             method = "xgboost",
             model_params = model_params)
simPop

Changing the last function call to disable parallelisation yields immediate results

simPop <- simCategorical(simPop,
             additional = c("pl030", "pb220a"),
             method = "xgboost",
             nr_cpus = 1,                                   # <- disable parallelisation
             model_params = model_params)
simPop

Propose to try other packages for parallelisation like parallelly(https://cran.r-project.org/web/packages/parallelly/) or future.apply (https://cran.r-project.org/web/packages/future.apply)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions