Skip to content

R + future, the Ram eater #635

@latot

Description

@latot

Hi hi, I really likes this package, packages like furrr uses it a lot, and now I'm going inside this package, very powerful, recently I notice the scripts was using a lot of ram, then I want to inspect it, actually R have some series problems handling its own memory, how R don't organize very well its own memory causes it can't be free even if "is free", here some info:

https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-is-R-apparently-not-releasing-memory_003f

R should move the objects from higher ram address to lower ones to can release it from the SO, but seems it just don't do that, this causes R mainly increases the Ram all the time, we can't remove or free items (we can, but the ram will not be unloaded), and the garbage collector don't works here.

After check this, I found we can use multiprocessing to assign vars and release the memory that can be cleaned, in my tests this is a lot of memory, loading a file of maps I notice, from the used 6GB of ram, 5GB of them is data used to load the file, and 1GB was the file loaded, this 5GB can't be collected by the garbage collector that should be cleaned, well now go back to the issue.

One way to can keep the ram clean is with multiprocessing, open a new R instance, do things, and return the result, the final process will be closed and the garbage will be removed, great, I have playing with future and furrr, now the results.

SO: Linux, gentoo 64
future: 1.25.0

The behavior seems to depend on the plan:

  • multicore, every worker will be loaded, and the ram will be cleaned when the child ends, great.
  • multisession and cluster will execute all the childs but none of them will release the Ram, + furrr is a super ram eater, set again the plan will clean the ram, probable because will close the process and open a new ones.
  • Use a plan for nested futures will cause the multicore don't release the Ram too, but in the case, even if we set the plan again the Raom will not be unloaded.

The code I use to this tests:

library(future)
f <- function(){
    value(future({
        #A super big file, the idea is can see in the ram monitor when the ram is free or not
        s<-sf::st_read("tmp.gpkg")
        #return 1, the memory used in the file should be removed, or not, just to know when happens and when not
        1
    }, seed=TRUE), gc=TRUE)
}

#in the end the Ram is free
plan(multicore, workers=4)
f()
gc()

#in the end the Ram is not free
plan(multisession, workers=4)
f()
gc()
#run this to clean the ram
plan(multisession, workers=4)

#in the end the ram is not free
plan(list(
    tweak(multicore, workers = 1),
    tweak(multicore, workers = 3)
))
f()
gc()

#in the end the ram is not free
plan(cluster, workers = c("localhost"))
f()
gc()

This affects a lot when we work with a lot of data..., particularly I was working constructing maps with furrr + multisession, the process was using..., more than 60GB of ram, I even need to set more swap to can continue.

As I know, multisession don't close the new process to can recicle them, an option to not recicle them, close them and open a new ones every time would be great, more time, but a lot of less Ram usage.

In the case of the nested future, I don't have idea why the ram has not been free after the process ends.

Future is great, I thing this can helps to balance the workload, is ideal have fast scripts, but some times, we need sacrifice time to save other resources to can keep it working.

Thx!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions