weithed clustering (was: Re: [R] problems with a large data set)
mlennert@ulb.ac.be
mlennert at ulb.ac.be
Fri Apr 27 11:18:40 CEST 2001
kmeans and clara work great. Thank you for the tip.
I have another question:
Is it possible to weight the observations in a cluster analysis ? I haven't
found any mention of this in the kmeans of clara help texts.
Moritz Lennert
Chargé de recherche
IGEAT - ULB
tél: 32-2-650.65.16
fax: 32-2-650.50.92
email: mlennert at ulb.ac.be
> On Wed, 25 Apr 2001, Moritz Lennert wrote:
>
> > Hello,
> >
> > I have trouble with a data set that comprises 2136 lines of 20 columns.
> > I would like to do a hierarchical clustering and I tried the following:
> >
> > ages.hclust <- hclust(dist(ages, method="euclidean"), "ward")
> >
> > but I get the following error message:
> >
> > Error: cannot allocate vector of size 17797 Kb
> >
> > When I try to do the dist() alone first without the hclust(), I get the
> > same type of message.
> >
> > Then I tried with the RPgSQL packages by typing
> >
> > >db.connect(dbname="space")
> > Connected to database "space" on "localhost"
> > > bind.db.proxy("ages")
> > > ages.hclust <- hclust(dist(ages, method="euclidean"), "ward")
>
> That does not help. You need to retrieve the data to use it!
>
> > This time I get:
> >
> > Error in dist(ages, method = "euclidean") :
> > NA/NaN/Inf in foreign function call (arg 1)
> > In addition: Warning message:
> > NAs introduced by coercion
> >
> >
> > I've checked, and I can't find any missing values of something similar.
> > Could someone tell me if I'm doing something wrong, or wether this is
> > just too much data for R ?
>
> This may be too much data for your computer, but not for R: I've
> just done this in a few seconds. I suggest that you need more memory
> (real or virtual): on my simulation it used about 80Mb.
>
> I should say that doing agglomerative hierarchical cluster on thousands of
> points makes little sense: it is a not a good way to find large clusters:
> try a partitioning method like kmeans or clara (in package cluster).
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272860 (secr)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list