[Rd] modifying large R objects in place

Fri Sep 28 17:36:40 CEST 2007

On Fri, 28 Sep 2007, Luke Tierney wrote:

> On Fri, 28 Sep 2007, Petr Savicky wrote:

[...]

>> This leads me to a question. Some of the tests, which I did, suggest
>> that gc() may not free all the memory, even if I remove all data
>> objects by rm() before calling gc(). Is this possible or I must have
>> missed something?

> Not impossible but very unlikely givent he use gc gets. There are a
> few internal tables that are grown but not shrunk at the moment but
> that should not usually cause much total growth.  If you are ooking at
> system memopry use then that is a malloc issue -- there was a thread
> about this a month or so ago.

A likely explanation is lazy-loading.  Almost all the package code is 
stored externally until used: 2.6.0 is better at not bringing in unused 
code.  E.g. (2.6.0, 64-bit system)

> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 141320  7.6     350000 18.7   350000 18.7
Vcells 130043  1.0     786432  6.0   561893  4.3
> for(s in search()) lsf.str(s)
> gc()
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells 424383 22.7     531268 28.4   437511 23.4
Vcells 228005  1.8     786432  6.0   700955  5.4

'if I remove all data objects by rm()' presumably means clearing the 
user workspace: there are lots of other environments containing objects 
('data' or otherwise), many of which are needed to run R.

Otherwise the footer to every R-help message applies ....

>> A possible solution to the unwanted increase of NAMED due to temporary
>> calculations could be to give the user the possibility
>> to store NAMED attribute of an object before a call to a function
>> and restore it after the call. To use this, the user should be
>> confident that no new reference to the object persists after the
>> function is completed.
>
> This would be too dangerous for general use. Some more structured
> approach may be possible. A related issue is that user-defined
> assignment functions always see a NAMED of 2 and hence cannot modify
> in place. We've been trying to come up with a reasonable solution to
> this, so far without success but I'm moderately hopeful.

I am not persuaded that the difference between NAMED=1/2 makes much 
difference in general use of R, and I recall Ross saying that he no longer 
believed that this was a worthwhile optimization.  It's not just 
'user-defined' replacement functions, but also all the system-defined 
closures (including all methods for the generic replacement functions 
which are primitive) that are unable to benefit from it.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595