[Rd] Suggestion for memory optimization and as.double() with friends

Duncan Murdoch murdoch at stats.uwo.ca
Thu Mar 29 00:04:07 CEST 2007


On 3/28/2007 5:25 PM, Henrik Bengtsson wrote:
> Hi,
> 
> when doing as.double() on an object that is already a double, the
> object seems to be copied internally, doubling the memory requirement.
>  See example below.  Same for as.character() etc.  Is this intended?
> 
> Example:
> 
> % R --vanilla
>> x <- double(1e7)
>> gc()
>            used (Mb) gc trigger (Mb) max used (Mb)
> Ncells   234019  6.3     467875 12.5   350000  9.4
> Vcells 10103774 77.1   11476770 87.6 10104223 77.1
>> x <- as.double(x)
>> gc()
>            used (Mb) gc trigger  (Mb) max used  (Mb)
> Ncells   234113  6.3     467875  12.5   350000   9.4
> Vcells 10103790 77.1   21354156 163.0 20103818 153.4
> 
> However, couldn't this easily be avoided by letting as.double() return
> the object as is if already a double?

as.double calls the internal as.vector, which also strips off 
attributes.  But in the case where the output is identical to the input, 
this does seem like an easy optimization.  I don't know if it would help 
most people, but it might help in the kinds of cases you mention.

Duncan Murdoch

> 
> Example:
> 
> % R --vanilla
>> as.double.double <- function(x, ...) x
>> x <- double(1e7)
>> gc()
>            used (Mb) gc trigger (Mb) max used (Mb)
> Ncells   234019  6.3     467875 12.5   350000  9.4
> Vcells 10103774 77.1   11476770 87.6 10104223 77.1
>> x <- as.double(x)
>> gc()
>            used (Mb) gc trigger (Mb) max used (Mb)
> Ncells   234028  6.3     467875 12.5   350000  9.4
> Vcells 10103779 77.1   12130608 92.6 10104223 77.1
> 
> What's the catch?
> 
> 
> The reason why I bring it up, is because many (most?) methods are
> using as.double() etc "just in case" when passing arguments to
> .Call(), .Fortran() etc, e.g. stats::smooth.spline():
> 
>     fit <- .Fortran(R_qsbart, as.double(penalty), as.double(dofoff),
>         x = as.double(xbar), y = as.double(ybar), w = as.double(wbar), <etc>)
> 
> Your memory usage is peaking in the actual call and the garbage
> collector cannot clean it up until after the call. This seems to be
> waste of memory, especially when the objects are large (100-1000MBs).
> 
> Cheers
> 
> Henrik
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list