[Rd] modifying large R objects in place
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Sep 27 16:45:03 CEST 2007
1) You implicitly coerced 'a' to be numeric and thereby (almost) doubled
its size: did you intend to? Does that explain your confusion?
2) I expected NAMED on 'a' to be incremented by nrow(a): here is my
understanding.
When you called nrow(a) you created another reference to 'a' in the
evaluation frame of nrow. (At a finer level you first created a promise
to 'a' and then dim(x) evaluated that promise, which did SET_NAMED(<SEXP>)
= 2.) So NAMED(a) was correctly bumped to 2, and it is never reduced.
More generally, any argument to a closure that actually gets used will
get NAMED set to 2.
Having too high a value of NAMED could never be a 'bug'. See the
explanation in the R Internals manual:
When an object is about to be altered, the named field is consulted. A
value of 2 means that the object must be duplicated before being
changed. (Note that this does not say that it is necessary to
duplicate, only that it should be duplicated whether necessary or not.)
3) Memory profiling can be helpful in telling you exactly what copies get
made.
On Thu, 27 Sep 2007, Petr Savicky wrote:
> On Wed, Sep 26, 2007 at 10:52:28AM -0700, Byron Ellis wrote:
>> For the most part, doing anything to an R object result in it's
>> duplication. You generally have to do a lot of work to NOT copy an R
>> object.
>
> Thank you for your response. Unfortunately, you are right. For example,
> the allocated memory determined by top command on Linux may change during
> a session as follows:
> a <- matrix(as.integer(1),nrow=14100,ncol=14100) # 774m
> a[1,1] <- 0 # 3.0g
> gc() # 1.5g
>
> In the current applicatin, I modify the matrix only using my own C code
> and only read it on R level. So, the above is not a big problem for me
> (at least not now).
>
> However, there is a related thing, which could be a bug. The following
> code determines the value of NAMED field in SEXP header of an object:
>
> SEXP getnamed(SEXP a)
> {
> SEXP out;
> PROTECT(out = allocVector(INTSXP, 1));
> INTEGER(out)[0] = NAMED(a);
> UNPROTECT(1);
> return(out);
> }
>
> Now, consider the following session
>
> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
> .Call("getnamed",u) # 1 (OK)
>
> length(u)
> .Call("getnamed",u) # 1 (OK)
>
> dim(u)
> .Call("getnamed",u) # 1 (OK)
>
> nrow(u)
> .Call("getnamed",u) # 2 (why?)
>
> u <- matrix(as.integer(1),nrow=5,ncol=3) + as.integer(0)
> .Call("getnamed",u) # 1 (OK)
> ncol(u)
> .Call("getnamed",u) # 2 (so, ncol does the same)
>
> Is this a bug?
>
> Petr Savicky.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list