[R] Garbage Collecting

Peter Dalgaard BSA p.dalgaard at biostat.ku.dk
Sat Jun 24 19:28:27 CEST 2000

Kjetil Kjernsmo <kjetil.kjernsmo at astro.uio.no> writes:

> I do a call 
> > l4 <- t(apply(cbind(rep(1000000, 4)), 1, lineprofile, 1, 80, 
>                 smap01n$histogram))
> and this will create a 101 * 4 element matrix. My first line in
> 'lineprofile' is gc(v=T), so it reports while running
> Garbage collection [nr. 362602]...
> 629255 cons cells free (78%)
> 155163 Kbytes of heap free (76%)
> Garbage collection [nr. 362603]...
> 627739 cons cells free (78%)
> 149224 Kbytes of heap free (73%)
> Garbage collection [nr. 362604]...
> 626223 cons cells free (78%)
> 143286 Kbytes of heap free (70%)
> Garbage collection [nr. 362605]...
> 624707 cons cells free (78%)
> 137347 Kbytes of heap free (67%)
> So, it seems like close to 6 MB is used for time lineprofile is run. I
> thought that the garbage collector would come and remove everything that
> wasn't needed, so that the difference in free heap size would be
> corresponding to the size of the array produced by lineprofile, that is,
> 202 bytes. 

Hm. I think I would do that as

do.call("rbind", lapply(rep(1000000,4), lineprofile, ....) 

(provided I grasped the intention correctly). Still looks strange,
though, and as usual I wouldn't rule out that there might be a problem
with 32/64 bit assumptions somewhere since we have so few people on 64
bit platforms. 

In general, the garbage collector should indeed remove anything that
is not used by an active R object or is otherwise protected. So if
lineprofile doesn't have any side effects, then only the return value
should be left after the call.

> Now, this isn't too much to go on, I realize, but I want to hear if my
> understanding of what the garbage collector is supposed to be doing is
> totally wrong before I get deeper into this. If I'm not totally
> wrong, and anybody cares to have a look, I have saved the objects that is
> sufficient to reproduce the problem on my system in
> <URL:http://www.astro.uio.no/~kjetikj/tmp/kjetil.RData>

Perhaps you could cut the size of the problem down a bit? Some of us
are sitting at machines with 64MB RAM and are not too thrilled about R
processes with 200+ MB of memory footprint...

If you're feeling adventurous, Luke Tierney has just opened up an
experimental branch with a substantially modified garbage collector. I
believe there's a way to get it by anonymous CVS, the branch label is
R-GenGC. However, that one should be *less* strict about cleaning out
every last bit of unused data than the current one (in the hope of
getting better speed in return).

   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list