[R] Can R handle a matrix with 8 billion entries?
David Winsemius
dwinsemius at comcast.net
Wed Aug 10 06:25:29 CEST 2011
On Aug 9, 2011, at 11:38 PM, Chris Howden wrote:
> Hi,
>
> I’m trying to do a hierarchical cluster analysis in R with a Big
> Data set.
> I’m running into problems using the dist() function.
>
> I’ve been looking at a few threads about R’s memory and have read the
> memory limits section in R help. However I’m no computer expert so I’m
> hoping I’ve misunderstood something and R can handle my Big Data set,
> somehow. Although at the moment I think my dataset is simply too big
> and
> there is no way around it, but I’d like to be proved wrong!
>
> My data set has 90523 rows of data and 24 columns.
>
> My understanding is that this means the distance matrix has a min of
> 90523^2 elements which is 8194413529. Which roughly translates as
> 8GB of
> memory being required (if I assume each entry requires 1 bit). I
> only have
> 4GB on a 32bit build of windows and R. So there is no way that’s
> going to
> work.
>
> So then I thought of getting access to a more powerful computer, and
> maybe
> using cloud computing.
>
> However the R memory limit help mentions “On all builds of R, the
> maximum
> length (number of elements) of a vector is 2^31 - 1 ~ 2*10^9”. Now
> as the
> distance matrix I require has more elements than this does this mean
> it’s
> too big for R no matter what I do?
Yes. Vector indexing is done with 4 byte integers.
--
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list