[Rd] Suggestion for memory optimization and as.double() with friends

Henrik Bengtsson hb at stat.berkeley.edu
Wed Mar 28 23:59:08 CEST 2007


On 3/28/07, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
> Hi,
>
> when doing as.double() on an object that is already a double, the
> object seems to be copied internally, doubling the memory requirement.
>  See example below.  Same for as.character() etc.  Is this intended?
>
> Example:
>
> % R --vanilla
> > x <- double(1e7)
> > gc()
>            used (Mb) gc trigger (Mb) max used (Mb)
> Ncells   234019  6.3     467875 12.5   350000  9.4
> Vcells 10103774 77.1   11476770 87.6 10104223 77.1
> > x <- as.double(x)
> > gc()
>            used (Mb) gc trigger  (Mb) max used  (Mb)
> Ncells   234113  6.3     467875  12.5   350000   9.4
> Vcells 10103790 77.1   21354156 163.0 20103818 153.4
>
> However, couldn't this easily be avoided by letting as.double() return
> the object as is if already a double?
>
> Example:
>
> % R --vanilla
> > as.double.double <- function(x, ...) x
> > x <- double(1e7)
> > gc()
>            used (Mb) gc trigger (Mb) max used (Mb)
> Ncells   234019  6.3     467875 12.5   350000  9.4
> Vcells 10103774 77.1   11476770 87.6 10104223 77.1
> > x <- as.double(x)
> > gc()
>            used (Mb) gc trigger (Mb) max used (Mb)
> Ncells   234028  6.3     467875 12.5   350000  9.4
> Vcells 10103779 77.1   12130608 92.6 10104223 77.1
>
> What's the catch?

Ok, one catch that my example didn't illustrate is: "as.double'
attempts to coerce its argument to be of double type: like 'as.vector'
it strips attributes including names." (from ?as.double).

So, answering my own question, I can see how stripping the attributes
"requires" a internal copy.  Anyhow, when there are stripping
attributes, the same idea still applies, with a more clever
as.double() function.

In the case when one want to coerce to a double, and keep existing
attributes, one could extend as.double() with:

 as.double(x, stripAttributes=FALSE)

and that code could be clever enough not to create and internal copy.

/Henrik

>
>
> The reason why I bring it up, is because many (most?) methods are
> using as.double() etc "just in case" when passing arguments to
> .Call(), .Fortran() etc, e.g. stats::smooth.spline():
>
>     fit <- .Fortran(R_qsbart, as.double(penalty), as.double(dofoff),
>         x = as.double(xbar), y = as.double(ybar), w = as.double(wbar), <etc>)
>
> Your memory usage is peaking in the actual call and the garbage
> collector cannot clean it up until after the call. This seems to be
> waste of memory, especially when the objects are large (100-1000MBs).
>
> Cheers
>
> Henrik
>



More information about the R-devel mailing list