[Rd] Suggestion for memory optimization and as.double() with friends

Duncan Murdoch murdoch at stats.uwo.ca
Thu Mar 29 03:48:09 CEST 2007


On 3/28/2007 8:17 PM, Henrik Bengtsson wrote:
> On 3/28/07, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 3/28/2007 5:25 PM, Henrik Bengtsson wrote:
>>> Hi,
>>>
>>> when doing as.double() on an object that is already a double, the
>>> object seems to be copied internally, doubling the memory requirement.
>>>  See example below.  Same for as.character() etc.  Is this intended?
>>>
>>> Example:
>>>
>>> % R --vanilla
>>>> x <- double(1e7)
>>>> gc()
>>>            used (Mb) gc trigger (Mb) max used (Mb)
>>> Ncells   234019  6.3     467875 12.5   350000  9.4
>>> Vcells 10103774 77.1   11476770 87.6 10104223 77.1
>>>> x <- as.double(x)
>>>> gc()
>>>            used (Mb) gc trigger  (Mb) max used  (Mb)
>>> Ncells   234113  6.3     467875  12.5   350000   9.4
>>> Vcells 10103790 77.1   21354156 163.0 20103818 153.4
>>>
>>> However, couldn't this easily be avoided by letting as.double() return
>>> the object as is if already a double?
>> as.double calls the internal as.vector, which also strips off
>> attributes.  But in the case where the output is identical to the input,
>> this does seem like an easy optimization.  I don't know if it would help
>> most people, but it might help in the kinds of cases you mention.
> 
> What about,
> 
> as.double.double <- function(x, ...) {
>  if (is.null(attributes(x))) x else NextMethod("as.double", x, ...)
> }
> 
> and same for as.integer(), as.logical(), as.complex(), as.raw(), and
> as.character()?

Yes, something like that, except it should be within the internal 
as.vector code.  Writing it in R code would impact all users, and might 
even negate any advantage you got from the lack of duplication.  For 
example, you'll be duplicating the attributes of x with the code above, 
but internal code could do the test without the duplication.

Duncan Murdoch

> 
> /Henrik
> 
>> Duncan Murdoch
>>
>>> Example:
>>>
>>> % R --vanilla
>>>> as.double.double <- function(x, ...) x
>>>> x <- double(1e7)
>>>> gc()
>>>            used (Mb) gc trigger (Mb) max used (Mb)
>>> Ncells   234019  6.3     467875 12.5   350000  9.4
>>> Vcells 10103774 77.1   11476770 87.6 10104223 77.1
>>>> x <- as.double(x)
>>>> gc()
>>>            used (Mb) gc trigger (Mb) max used (Mb)
>>> Ncells   234028  6.3     467875 12.5   350000  9.4
>>> Vcells 10103779 77.1   12130608 92.6 10104223 77.1
>>>
>>> What's the catch?
>>>
>>>
>>> The reason why I bring it up, is because many (most?) methods are
>>> using as.double() etc "just in case" when passing arguments to
>>> .Call(), .Fortran() etc, e.g. stats::smooth.spline():
>>>
>>>     fit <- .Fortran(R_qsbart, as.double(penalty), as.double(dofoff),
>>>         x = as.double(xbar), y = as.double(ybar), w = as.double(wbar), <etc>)
>>>
>>> Your memory usage is peaking in the actual call and the garbage
>>> collector cannot clean it up until after the call. This seems to be
>>> waste of memory, especially when the objects are large (100-1000MBs).
>>>
>>> Cheers
>>>
>>> Henrik
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>



More information about the R-devel mailing list