[R] foreach/dopar's processes accumulate RAM

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Thu Oct 30 16:27:39 CET 2014


Don't know the answer, partly because you have not shown enough of your code. The reason small, reproducible examples are specified in the footer and Posting Guide is to avoid this problem.
AFAIK all of the parallel processing libraries in R re-use the child processes, so garbage collection could be an issue. The nested for loops are probably a red herring, though, since that simply expands the number of iterations. The total number of iterations is probably relevant, and the setup of your parallel processes is probably important, as well as your sessionInfo().
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On October 29, 2014 1:48:05 AM PDT, Alexander Engelhardt <alex at chaotic-neutral.de> wrote:
>Hello all,
>
>I have a triple nested loop in R like this:
>
>all <- list()
>for(a in A){
>     all[[a]] <- list()
>     for(b in B){
>         all[[a]][[b]] <- foreach(c=C, .combine=rbind) %dopar% {
>             ## I'm leaving out some preprocessing here
>             this_GAM <- gam(formula, data=data, family=nb(link="log", 
>theta=THETA))
>             predict(this_GAM, newdata=newdata)
>         }
>     }
>}
>
>The problem I have is that, with time, the individual R processes which
>
>the %dopar% spawns use up more and more RAM. When I start the triple 
>loop, each process requires about 2GB of RAM, but after around eight 
>hours, they use >4GB each. Here's the first two lines of a 'top'
>output:
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
>
>20880 engelhar  20   0 7042m 4.0g 2436 R 59.2  6.4  14:30.15 R 
>
>20878 engelhar  20   0 7042m 4.3g 2436 D 53.5  6.8  14:07.18 R 
>
>
>I don't understand how this can happen. To my understanding, as soon as
>
>the foreach loop is done, i.e. as soon as a new 'b' is chosen from 'B' 
>in the second loop, the individual parallel R processes should
>terminate 
>and release the memory. There should not be an increase of memory 
>consumption over time.
>
>Does anyone know what is going on and how I can avoid this behavior?
>
>Thanks in advance,
>  Alex Engelhardt
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list