[R-sig-hpc] Seeing some memory leak with foreach...

Aaron King kingaa at umich.edu
Wed Feb 27 19:05:22 CET 2013


Hi Jonathan,

I've run into a similar problem before.  It took more than 2 weeks to
track down, but when I did, it turned out to be associated with
resolving dynamically-linked symbols ('getNativeSymbolInfo').  My code
was doing a lot of this very frequently, which led to memory leak.
Once I found the source of the problem, I reworked my code to avoid
this (a good idea anyway since the symbol-resolution was largely
redundant) and never got to the very bottom of the problem, which may
very well have been in the linux kernel rather than in R itself.  This
may have nothing to do with the problem you're experiencing: if it
does, I hope this note will save you some time.  It would be
interesting to hear about the source of the memory leaks, whatever it
turns out to be.

Have you tried another parallel backend or a parallelization approach
other than 'foreach'?

Aaron

On Tue, Feb 26, 2013 at 9:49 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
> r-sig-geo'ers:
>
> I always hate doing this, but the test function/dataset is going to be
> hard to pass along to the list.  Basically: I have a foreach call that
> has no superassignments or strange environmental manipulations, but
> resulted in the nodes showing a slow but steady memory creep over
> time.  I was using a parallel backend for foreach via doParallel.  Has
> anyone else seen this behavior (unexplained memory creep)?  Is there a
> good way to "flush" a node?  I'm trying to embed gc() at the top of my
> foreach function, but this process took about 24 hours to get to a
> memory overuse stage (multiple iterations would have passed, e.g. the
> function would have been called more than one time on a single node)
> so I'm not sure if this will work so I figured I'd ask the group about
> it.  I've seen other people post about this on various boards with no
> clear response/solution to it (gc() apparently didn't work).
>
> Some other notes: there should be no resultant output of data, because
> the output is being written from within the foreach function (e.g. the
> output of the function that foreach executes is NULL).
>
> I'll see if I can work up a faster executing example later, but wanted
> to see if there are some general pointers for dealing with memory
> leaks using a parallel system.
>
> --j
>
> --
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Global Environmental Analysis and Remote Sensing (GEARS) Laboratory
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 217-300-1924
> http://www.geog.illinois.edu/~jgrn/
> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



-- 
Aaron A. King, Ph.D.
Ecology & Evolutionary Biology
Mathematics
Center for the Study of Complex Systems
University of Michigan
GPG Public Key: 0x15780975



More information about the R-sig-hpc mailing list