dealing with large objects -- memory wasting ?

Martyn Plummer plummer@iarc.fr
Fri, 04 Jun 1999 19:14:46 +0200 (CEST)


[Long example of matrix() wasting memory
 by Martin Maechler (MM) snipped]

A minor point perhaps, but  I think there is an error
in your calculations.

If I understand correctly, the problem is that matrix()
assigns a local copy of its answer before returning it.
So  a version which does not do this ...

function (data = NA, nrow = 1, ncol = 1, byrow = FALSE) 
{
    if (missing(nrow)) 
        nrow <- ceiling(length(data)/ncol)
    else if (missing(ncol)) 
        ncol <- ceiling(length(data)/nrow)
    .Internal(matrix(data, nrow, ncol, byrow))
}

should do better 

Using commands like

rm(X); n <- ... ; p <- 20; X <- matrix(rnorm(n*p), n,p); gc()

the largest value of n I could use successfully was about
18000, which still less than what you suggest,

MM> Since we have 747 thousands of them , 
MM> constructing X the double size (400'000) shouldn't be a problem ...

and only 50% greater than what you can do with the standard matrix()
function (n ~ 12000).

I think the answer is that your calculations did not take
into account the argument to matrix - rnorm(n*p) - which
also temporarily takes up as much memory as the final matrix.

With trivial data you can do better:

rm(X); n <- ... ; p <- 20; X <- matrix(0, n,p); gc()

You can assign up to n ~ 37000 with the standard matrix()
function and n ~ 74000 with the modified version, which
is the expected 100% improvement.

MM>There seem to be worse problems when use 
MM>
MM>     var(x)
MM>
MM>and x is one of those huge  n x p  matrices...

I couldn't assign a matrix that was big enough to crash var().
Is there a problem here? The fact that the default value of 
y is x is not a problem because of lazy evaluation.
If you assigned y in the body of the function ...

function (x, y, na.rm = FALSE, use) 
{
    if (missing(y)) 
        y <- x
    if (missing(use)) 
        use <- if (na.rm) 
            "complete.obs"
        else "all.obs"
    cov(x, y, use = use)
}

then you would have problems, but this isn't the case.
What am I missing?

Martyn
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._