[R] matrix of size 30^5

David Winsemius dwinsemius at comcast.net
Sun Apr 21 00:32:00 CEST 2013


On Apr 20, 2013, at 2:19 PM, Benjamin Caldwell wrote:

> Dear R helpers
> 
> Reproducible example:
> 
> #warning - this causes a hard freeze on the machines I've tried it on
> matrix.holder<- matrix(rnorm(150), nrow=30, ncol=5)
> 
> Out=
> expand.grid(matrix.holder[,1],matrix.holder[,2],matrix.holder[,3],matrix.holder[,4],
> matrix.holder[,5])
> 

On my machine:

object.size(Out)
972014344 bytes

So with proper setup you might be able to work with this on a 4GB machine, but not likely to be able to do so on the machine you are describing below.


> Problem:
> 
> I'm running an analysis that I would like to do using a matrix containing
> all the possible combinations of the elements in a [30,5] matrix. Briefly,
> each possible combination is used to index and subset another matrix. I
> then run some models on the data in the subsetted matrix and then sometimes
> export the model results based on a couple criteria. 24,300,000
> combinations seems to be too big for R on my computer (Intel i5, about 2.5
> GB RAM free, 4 GB total, Rx64 2.15 ) to handle.
> 
> Requests:
> 
> 1. Can you tell me how I can estimate the amount of memory a matrix will
> require before I create it?

Roughly:  5* 8* prod(dim(mat))   # 8 bytes per double

> 5*8*(30^5)/972014344
[1] 0.9999852    # so my estimate was accurate on a ratio basis to 5 decimal places.

> 
> 2. Do you have recommendations for packages that allow the user to send an
> object directly to the hard drive? I guess it would have to be partially
> created in RAM and then dumped to the HD, but the point is that there isn't
> room for whole thing to be created and then written in pieces to the HD
> (which even I think I could do). And then of course if it was written as
> one big piece to the HD, I would need to be able to read it in piece by
> piece.


> 
> 3. I also see packages out there to connect R to C. Anyone have ideas for
> one designed or containing functions designed for this type of problem?

Are you saying you have facility with C programming? (And you really have not described the problem. Perhaps a redesign of the solution could accommodate your limited computing resources.

> 
> Background:
> 
> When I tried to throw expand.grid() at a matrix of size [30,5] (24,300,000
> combinations), my computer choked (I assume due to RAM memory limits, but
> it might be that doing that just takes a long time and I wasn't ready to
> stare at a frozen computer for very long).

It took 7 seconds on my 6 year-old MacPro.

> I'm currently working around the
> problem with five nested loops, with all the drawbacks of and limits
> imposed by that approach (the biggest for me is that I'd like to attempt to
> multithread with some of the packages that exist for that).
> 
> I don't have any formal training in computer science, and the only
> programming language I use enough to do something of this complexity is R,
> so programming the whole thing in C (which all the remote sensing folks
> across the hall said would make creation of this matrix trivial) isn't an
> easy alternative for me.

There are many threads on Rhelp and advice in various manuals about how to avoid memory limitations.
> 
> Thanks!
> 
> Ben Caldwell
> 
> Graduate Fellow
> University of California, Berkeley

-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list