[Rd] modifying large R objects in place
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Sep 28 17:36:40 CEST 2007
On Fri, 28 Sep 2007, Luke Tierney wrote:
> On Fri, 28 Sep 2007, Petr Savicky wrote:
[...]
>> This leads me to a question. Some of the tests, which I did, suggest
>> that gc() may not free all the memory, even if I remove all data
>> objects by rm() before calling gc(). Is this possible or I must have
>> missed something?
> Not impossible but very unlikely givent he use gc gets. There are a
> few internal tables that are grown but not shrunk at the moment but
> that should not usually cause much total growth. If you are ooking at
> system memopry use then that is a malloc issue -- there was a thread
> about this a month or so ago.
A likely explanation is lazy-loading. Almost all the package code is
stored externally until used: 2.6.0 is better at not bringing in unused
code. E.g. (2.6.0, 64-bit system)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 141320 7.6 350000 18.7 350000 18.7
Vcells 130043 1.0 786432 6.0 561893 4.3
> for(s in search()) lsf.str(s)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 424383 22.7 531268 28.4 437511 23.4
Vcells 228005 1.8 786432 6.0 700955 5.4
'if I remove all data objects by rm()' presumably means clearing the
user workspace: there are lots of other environments containing objects
('data' or otherwise), many of which are needed to run R.
Otherwise the footer to every R-help message applies ....
>> A possible solution to the unwanted increase of NAMED due to temporary
>> calculations could be to give the user the possibility
>> to store NAMED attribute of an object before a call to a function
>> and restore it after the call. To use this, the user should be
>> confident that no new reference to the object persists after the
>> function is completed.
>
> This would be too dangerous for general use. Some more structured
> approach may be possible. A related issue is that user-defined
> assignment functions always see a NAMED of 2 and hence cannot modify
> in place. We've been trying to come up with a reasonable solution to
> this, so far without success but I'm moderately hopeful.
I am not persuaded that the difference between NAMED=1/2 makes much
difference in general use of R, and I recall Ross saying that he no longer
believed that this was a worthwhile optimization. It's not just
'user-defined' replacement functions, but also all the system-defined
closures (including all methods for the generic replacement functions
which are primitive) that are unable to benefit from it.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list