[R] Memory exhausted with "dist" in "mva" library

Martin Maechler maechler at stat.math.ethz.ch
Thu May 23 14:49:44 CEST 2002


>>>>> "Kenneth" == Kenneth Cabrera <krcabrer at epm.net.co> writes:

    Kenneth> I have a database with 25000 rows and 30 columns
    Kenneth> and I want to make cluster analysis to cluster the
    Kenneth> 25000 records,

    Kenneth> but the memory exhausted using the "dist" function
    Kenneth> in "mva" library.  I use the "--max-mem-size" up to
    Kenneth> 1780Mb (If I use more the R returns me a error message)

    Kenneth> What can I do?

not use any distance (dissimilarity) based clustering method if
possible because that saves a lot of memory.

library(cluster) {and other CRAN non-base packages} is
recommended for more flexibility.
In particular, I'd recommend using  daisy() instead of dist() quite a bit!

clara()  {in the cluster package} was written for
  Clustering
  ~LARge
   ~~~Application.
      ~

but there are many more cluster methodologies that work with
euclidean (or manhattan) metric directly instead of first
computing the n(n-1)/2 distances.

I hope this gets you started.

Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list