[Rd] Suggestion for memory optimization and as.double() with friends
Henrik Bengtsson
hb at stat.berkeley.edu
Thu Mar 29 02:17:39 CEST 2007
On 3/28/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 3/28/2007 5:25 PM, Henrik Bengtsson wrote:
> > Hi,
> >
> > when doing as.double() on an object that is already a double, the
> > object seems to be copied internally, doubling the memory requirement.
> > See example below. Same for as.character() etc. Is this intended?
> >
> > Example:
> >
> > % R --vanilla
> >> x <- double(1e7)
> >> gc()
> > used (Mb) gc trigger (Mb) max used (Mb)
> > Ncells 234019 6.3 467875 12.5 350000 9.4
> > Vcells 10103774 77.1 11476770 87.6 10104223 77.1
> >> x <- as.double(x)
> >> gc()
> > used (Mb) gc trigger (Mb) max used (Mb)
> > Ncells 234113 6.3 467875 12.5 350000 9.4
> > Vcells 10103790 77.1 21354156 163.0 20103818 153.4
> >
> > However, couldn't this easily be avoided by letting as.double() return
> > the object as is if already a double?
>
> as.double calls the internal as.vector, which also strips off
> attributes. But in the case where the output is identical to the input,
> this does seem like an easy optimization. I don't know if it would help
> most people, but it might help in the kinds of cases you mention.
What about,
as.double.double <- function(x, ...) {
if (is.null(attributes(x))) x else NextMethod("as.double", x, ...)
}
and same for as.integer(), as.logical(), as.complex(), as.raw(), and
as.character()?
/Henrik
>
> Duncan Murdoch
>
> >
> > Example:
> >
> > % R --vanilla
> >> as.double.double <- function(x, ...) x
> >> x <- double(1e7)
> >> gc()
> > used (Mb) gc trigger (Mb) max used (Mb)
> > Ncells 234019 6.3 467875 12.5 350000 9.4
> > Vcells 10103774 77.1 11476770 87.6 10104223 77.1
> >> x <- as.double(x)
> >> gc()
> > used (Mb) gc trigger (Mb) max used (Mb)
> > Ncells 234028 6.3 467875 12.5 350000 9.4
> > Vcells 10103779 77.1 12130608 92.6 10104223 77.1
> >
> > What's the catch?
> >
> >
> > The reason why I bring it up, is because many (most?) methods are
> > using as.double() etc "just in case" when passing arguments to
> > .Call(), .Fortran() etc, e.g. stats::smooth.spline():
> >
> > fit <- .Fortran(R_qsbart, as.double(penalty), as.double(dofoff),
> > x = as.double(xbar), y = as.double(ybar), w = as.double(wbar), <etc>)
> >
> > Your memory usage is peaking in the actual call and the garbage
> > collector cannot clean it up until after the call. This seems to be
> > waste of memory, especially when the objects are large (100-1000MBs).
> >
> > Cheers
> >
> > Henrik
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
More information about the R-devel
mailing list