[Rd] Suggestion for memory optimization and as.double() with friends

Simon Urbanek simon.urbanek at r-project.org
Thu Mar 29 23:05:50 CEST 2007


Seth, good point. I think we should be able to do better...

On Mar 29, 2007, at 10:57 AM, Seth Falcon wrote:

> Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>> The usual 'trick' to avoid this copy is
>>
>> storage.mode(x) <- "double"
>
> Hmm, this does not appear to avoid the copy for me.  Using R 2.5.0  
> alpha r40916 I get:
>
>> x <- 1:10 * 2.3
>> names(x)=LETTERS[1:10]
>> storage.mode(x)
>     [1] "double"
>> tracemem(x)
>     [1] "<0x2a7f008>"
>> storage.mode(x) <- "double"
>     tracemem[0x2a7f008 -> 0x1fa6df8]:
>
> Note that actually changing the storage results in a surprising amount
> of copying:
>
>> storage.mode(x) <- "integer"
>     tracemem[0x1fa6df8 -> 0x1fa6d60]:

I believe this is due to arguments copy because of NAMED=2 - it  
doesn't appear when you work on a pristine x (NAMED=1). (*)


>     tracemem[0x1fa6d60 -> 0x1fa6808]: as.integer.default as.integer  
> eval eval storage.mode<-

comes from as.vector, additional copy due to NAMED>0. In fact,  
as.vector does something like this (in ascommon at coerce.c):
         if (NAMED(u))
             v = duplicate(u);
         else v = u;
        [...]
         v = coerceVector(v, type);

I suppose the duplication could be avoided, because coerceVector will  
produce a copy anyway ... Would it be safe to change it to something  
like this:?

    v = coerceVector(u, type);
    if (v == u && NAMED(u)) v = duplicate(u);

I suspect that the duplication is necessary only because of the fact  
that attributes may get scrubbed later. If that is true, can we defer  
the copying just before CLEAR_ATTRIB branch ... Are my assumptions  
correct?


>     tracemem[0x1fa6808 -> 0x2c26b18]: as.integer.default as.integer  
> eval eval storage.mode<-

this is the conversion itself (coerceVector), that's fine


>     tracemem[0x2c26b18 -> 0x2c26a88]: storage.mode<-
>

This one is caused by:
  attr(x, "Csingle") <- if (value == "single") TRUE

in "storage.mode<-". It calls attr<- unconditionally, so either  
attr<- should be smart enough to not copy x on a no-op  or it could  
be replaced by something like:
if (value == "single") attr(x, "Csingle") <- TRUE else if (!is.null 
(attr(x, "Csingle"))) attr(x, "Csingle") <- NULL

I guess that in most cases Csingle will be untouched anyway.

---

Ok, so I can see how we can eliminate 2 of the four copies. I'm still  
not sure what causes the first one (*).

 > x=rnorm(100)
 > tracemem(x)
[1] "<0x1b3da00>"
 > storage.mode(x)<-"double"
 > storage.mode(x)<-"double"
tracemem[0x1b3da00 -> 0x1820800]:
 > storage.mode(x)<-"double"
tracemem[0x1820800 -> 0x1f29600]:

The only difference is that the resulting x has NAMED=2:

 > x=rnorm(100)
 > tracemem(x)
[1] "<0x29b1a00>"
 > insp(x)
@029b1a00 14 REALSXP [NAM(1)] (len=100, tl=34)
 > storage.mode(x)<-"double"
 > insp(x)
@029b1a00 14 REALSXP [NAM(2)] (len=100, tl=34)
 > storage.mode(x)<-"double"
tracemem[0x29b1a00 -> 0x1a47e00]:

I'm not sure why, because storage.mode is a no-op if the mode is  
correct... it has probably to do with the subassignment function  
evaluation I suppose (which I didn't look at ...), but I'm not sure...

Cheers,
Simon



More information about the R-devel mailing list