[Rd] hash table clean-up

luke-tierney at uiowa.edu luke-tierney at uiowa.edu
Sun Mar 4 22:19:56 CET 2012


On Sun, 4 Mar 2012, Florent D. wrote:

> Hello,
>
> I have noticed that the memory usage inside an R session increases as
> more and more objects with unique names are created, even after they
> are removed. Here is a small reproducible example:
>
>> gc()
>         used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 531720 14.2     899071 24.1   818163 21.9
> Vcells 247949  1.9     786432  6.0   641735  4.9
>>
>> for (i in 1:100000) {
> + name <- paste("x", runif(1), sep="")
> + assign(name, NULL)
> + rm(list=name)
> + rm(name)
> }
>>
>> gc()
>         used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 831714 22.3    1368491 36.6  1265230 33.8
> Vcells 680551  5.2    1300721 10.0   969572  7.4
>
> It appears the increase in memory usage is due to the way R's
> environment hash table operates
> (http://cran.r-project.org/doc/manuals/R-ints.html#Hash-table): as
> objects with new names are created, new entries are made in the hash
> table; but when the objects are removed from the environment, the
> corresponding entries are not deleted.

Your analysis is incorrect. What you are seeing is the fact that thea
symbol or name objects used as keys are being added to the global
symbol table and that is not garbage collected. I believe that too
many internals rely on this for it to be changed any time soon.  It
may be possible to have some symbols GC protected and others not, but
again that would require very careful throught and implementation and
isn't likely to be a priority anty time soon as far as I can see.

There may be some value in having hash tables that use some form of
uninterned symbols as keys at some point but that is a larger project
that might be better provided by a contributed package, at least
initially.

Best,

luke

>
> I hope you will agree the growth in memory size is an undesirable
> feature and can address the issue in a future release. If not, please
> let me know why you think it should remain this way.
>
> I believe a fix could be made around the time the hash table is
> resized, where only non-removed items would be kept. I can try to make
> those changes to src/main/envir.c myself, but C is not my area of
> expertise. So if you beat me to it, please let me know.
>
> Thank you,
> Florent.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list