[Rd] Suggestion for memory optimization and as.double() with friends
Simon Urbanek
simon.urbanek at r-project.org
Thu Mar 29 23:05:50 CEST 2007
Seth, good point. I think we should be able to do better...
On Mar 29, 2007, at 10:57 AM, Seth Falcon wrote:
> Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
>> The usual 'trick' to avoid this copy is
>>
>> storage.mode(x) <- "double"
>
> Hmm, this does not appear to avoid the copy for me. Using R 2.5.0
> alpha r40916 I get:
>
>> x <- 1:10 * 2.3
>> names(x)=LETTERS[1:10]
>> storage.mode(x)
> [1] "double"
>> tracemem(x)
> [1] "<0x2a7f008>"
>> storage.mode(x) <- "double"
> tracemem[0x2a7f008 -> 0x1fa6df8]:
>
> Note that actually changing the storage results in a surprising amount
> of copying:
>
>> storage.mode(x) <- "integer"
> tracemem[0x1fa6df8 -> 0x1fa6d60]:
I believe this is due to arguments copy because of NAMED=2 - it
doesn't appear when you work on a pristine x (NAMED=1). (*)
> tracemem[0x1fa6d60 -> 0x1fa6808]: as.integer.default as.integer
> eval eval storage.mode<-
comes from as.vector, additional copy due to NAMED>0. In fact,
as.vector does something like this (in ascommon at coerce.c):
if (NAMED(u))
v = duplicate(u);
else v = u;
[...]
v = coerceVector(v, type);
I suppose the duplication could be avoided, because coerceVector will
produce a copy anyway ... Would it be safe to change it to something
like this:?
v = coerceVector(u, type);
if (v == u && NAMED(u)) v = duplicate(u);
I suspect that the duplication is necessary only because of the fact
that attributes may get scrubbed later. If that is true, can we defer
the copying just before CLEAR_ATTRIB branch ... Are my assumptions
correct?
> tracemem[0x1fa6808 -> 0x2c26b18]: as.integer.default as.integer
> eval eval storage.mode<-
this is the conversion itself (coerceVector), that's fine
> tracemem[0x2c26b18 -> 0x2c26a88]: storage.mode<-
>
This one is caused by:
attr(x, "Csingle") <- if (value == "single") TRUE
in "storage.mode<-". It calls attr<- unconditionally, so either
attr<- should be smart enough to not copy x on a no-op or it could
be replaced by something like:
if (value == "single") attr(x, "Csingle") <- TRUE else if (!is.null
(attr(x, "Csingle"))) attr(x, "Csingle") <- NULL
I guess that in most cases Csingle will be untouched anyway.
---
Ok, so I can see how we can eliminate 2 of the four copies. I'm still
not sure what causes the first one (*).
> x=rnorm(100)
> tracemem(x)
[1] "<0x1b3da00>"
> storage.mode(x)<-"double"
> storage.mode(x)<-"double"
tracemem[0x1b3da00 -> 0x1820800]:
> storage.mode(x)<-"double"
tracemem[0x1820800 -> 0x1f29600]:
The only difference is that the resulting x has NAMED=2:
> x=rnorm(100)
> tracemem(x)
[1] "<0x29b1a00>"
> insp(x)
@029b1a00 14 REALSXP [NAM(1)] (len=100, tl=34)
> storage.mode(x)<-"double"
> insp(x)
@029b1a00 14 REALSXP [NAM(2)] (len=100, tl=34)
> storage.mode(x)<-"double"
tracemem[0x29b1a00 -> 0x1a47e00]:
I'm not sure why, because storage.mode is a no-op if the mode is
correct... it has probably to do with the subassignment function
evaluation I suppose (which I didn't look at ...), but I'm not sure...
Cheers,
Simon
More information about the R-devel
mailing list