[Rd] hash table clean-up

Simon Urbanek simon.urbanek at r-project.org
Sun Mar 4 23:29:19 CET 2012


On Mar 4, 2012, at 4:40 PM, Gabor Grothendieck wrote:

> On Sun, Mar 4, 2012 at 4:19 PM,  <luke-tierney at uiowa.edu> wrote:
>> On Sun, 4 Mar 2012, Florent D. wrote:
>> 
>>> Hello,
>>> 
>>> I have noticed that the memory usage inside an R session increases as
>>> more and more objects with unique names are created, even after they
>>> are removed. Here is a small reproducible example:
>>> 
>>>> gc()
>>> 
>>>        used (Mb) gc trigger (Mb) max used (Mb)
>>> Ncells 531720 14.2     899071 24.1   818163 21.9
>>> Vcells 247949  1.9     786432  6.0 641735 4.9
>>>> 
>>>> 
>>>> for (i in 1:100000) {
>>> 
>>> + name <- paste("x", runif(1), sep="")
>>> + assign(name, NULL)
>>> + rm(list=name)
>>> + rm(name)
>>> }
>>>> 
>>>> 
>>>> gc()
>>> 
>>>        used (Mb) gc trigger (Mb) max used (Mb)
>>> Ncells 831714 22.3    1368491 36.6  1265230 33.8
>>> Vcells 680551  5.2    1300721 10.0   969572  7.4
>>> 
>>> It appears the increase in memory usage is due to the way R's
>>> environment hash table operates
>>> (http://cran.r-project.org/doc/manuals/R-ints.html#Hash-table): as
>>> objects with new names are created, new entries are made in the hash
>>> table; but when the objects are removed from the environment, the
>>> corresponding entries are not deleted.
>> 
>> 
>> Your analysis is incorrect. What you are seeing is the fact that thea
>> symbol or name objects used as keys are being added to the global
>> symbol table and that is not garbage collected. I believe that too
>> many internals rely on this for it to be changed any time soon.  It
>> may be possible to have some symbols GC protected and others not, but
>> again that would require very careful throught and implementation and
>> isn't likely to be a priority anty time soon as far as I can see.
>> 
>> There may be some value in having hash tables that use some form of
>> uninterned symbols as keys at some point but that is a larger project
>> that might be better provided by a contributed package, at least
>> initially.
>> 
> 
> Does this apply to lists too or just environments?
> 

Just environments and pairlists (the latter don't use hashing, though). Lists (i.e. generic vectors) are not keyed by symbols (but are not hashed, either).

Cheers,
S



More information about the R-devel mailing list