[R] [dist]how to analise a large matrix?

Charles C. Berry cberry at tajo.ucsd.edu
Fri Aug 22 02:31:20 CEST 2008


On Thu, 21 Aug 2008, mcnda839 at mncn.csic.es wrote:

> Hi all,
>
> I have a matrix of about 100.000 x 4 that I need to classify using
> euclidean metric. For that I am using dist or daisy functions, but I
> am afraid that the message: Error in vector("double", length) : vector
> size specified is too large, means too much lines.
>

Yes, your distance matrix will take dozens of Gigabytes to store.


> Can anyone suggest me how should I analyse this matrix?

Try something other than 'hierarchical clustering'.

See
 	http://cran.r-project.org/web/views/Cluster.html

for some suggestions.

kmeans(), perhaps ?

HTH,

Chuck

>
> Thanks in advance,
>
> Diogo André Alagador
> MNCN,CSIC, Madrid, Spain
> ISA, Lisbon, Portugal
>    
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list