[R] Can R handle a matrix with 8 billion entries?

David Winsemius dwinsemius at comcast.net
Wed Aug 10 06:25:29 CEST 2011


On Aug 9, 2011, at 11:38 PM, Chris Howden wrote:

> Hi,
>
> I’m trying to do a hierarchical cluster analysis in R with a Big  
> Data set.
> I’m running into problems using the dist() function.
>
> I’ve been looking at a few threads about R’s memory and have read the
> memory limits section in R help. However I’m no computer expert so I’m
> hoping I’ve misunderstood something and R can handle my Big Data set,
> somehow. Although at the moment I think my dataset is simply too big  
> and
> there is no way around it, but I’d like to be proved wrong!
>
> My data set has 90523 rows of data and 24 columns.
>
> My understanding is that this means the distance matrix has a min of
> 90523^2 elements which is 8194413529. Which roughly translates as  
> 8GB of
> memory being required (if I assume each entry requires 1 bit). I  
> only have
> 4GB on a 32bit build of windows and R. So there is no way that’s  
> going to
> work.
>
> So then I thought of getting access to a more powerful computer, and  
> maybe
> using cloud computing.
>
> However the R memory limit help mentions  “On all builds of R, the  
> maximum
> length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9”. Now  
> as the
> distance matrix I require has more elements than this does this mean  
> it’s
> too big for R no matter what I do?

Yes. Vector indexing is done with 4 byte integers.

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list