[Rd] hash table clean-up
Simon Urbanek
simon.urbanek at r-project.org
Sun Mar 4 23:29:19 CET 2012
On Mar 4, 2012, at 4:40 PM, Gabor Grothendieck wrote:
> On Sun, Mar 4, 2012 at 4:19 PM, <luke-tierney at uiowa.edu> wrote:
>> On Sun, 4 Mar 2012, Florent D. wrote:
>>
>>> Hello,
>>>
>>> I have noticed that the memory usage inside an R session increases as
>>> more and more objects with unique names are created, even after they
>>> are removed. Here is a small reproducible example:
>>>
>>>> gc()
>>>
>>> used (Mb) gc trigger (Mb) max used (Mb)
>>> Ncells 531720 14.2 899071 24.1 818163 21.9
>>> Vcells 247949 1.9 786432 6.0 641735 4.9
>>>>
>>>>
>>>> for (i in 1:100000) {
>>>
>>> + name <- paste("x", runif(1), sep="")
>>> + assign(name, NULL)
>>> + rm(list=name)
>>> + rm(name)
>>> }
>>>>
>>>>
>>>> gc()
>>>
>>> used (Mb) gc trigger (Mb) max used (Mb)
>>> Ncells 831714 22.3 1368491 36.6 1265230 33.8
>>> Vcells 680551 5.2 1300721 10.0 969572 7.4
>>>
>>> It appears the increase in memory usage is due to the way R's
>>> environment hash table operates
>>> (http://cran.r-project.org/doc/manuals/R-ints.html#Hash-table): as
>>> objects with new names are created, new entries are made in the hash
>>> table; but when the objects are removed from the environment, the
>>> corresponding entries are not deleted.
>>
>>
>> Your analysis is incorrect. What you are seeing is the fact that thea
>> symbol or name objects used as keys are being added to the global
>> symbol table and that is not garbage collected. I believe that too
>> many internals rely on this for it to be changed any time soon. It
>> may be possible to have some symbols GC protected and others not, but
>> again that would require very careful throught and implementation and
>> isn't likely to be a priority anty time soon as far as I can see.
>>
>> There may be some value in having hash tables that use some form of
>> uninterned symbols as keys at some point but that is a larger project
>> that might be better provided by a contributed package, at least
>> initially.
>>
>
> Does this apply to lists too or just environments?
>
Just environments and pairlists (the latter don't use hashing, though). Lists (i.e. generic vectors) are not keyed by symbols (but are not hashed, either).
Cheers,
S
More information about the R-devel
mailing list