[R] Cluster on both categorical and numerical data

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Jun 18 14:02:45 CEST 2008


On Wed, 2008-06-18 at 12:43 +0100, Gavin Simpson wrote:
> On Wed, 2008-06-18 at 03:45 -0700, Birgitle wrote:
> > You could have a look at library(analogue) , function ?distance
> 
> Thanks for the plug Birgit, but (and I say this as the author of
> distance), if you just want to compute a dissimilarity matrix using
> Gower's coefficient for mixed data, use daisy() from recommended package
> cluster because i) as cluster is recommended you don't need to install
> further packages, and ii) I haven't done timings, but daisy() will be
> much faster, and potentially use less memory, than distance() because
> daisy() is in compiled FORTRAN and is doing half the computations that
> distance does, which uses a pure R approach.
> 
> distance was written with a very specific use-case in mind; of
> dissimilarities between rows of matrix A and rows of matrix B. That it
> does full dissimilarity matrix computation when provided a single matrix
> is a side effect (one that I intend to keep however).
> 
> Eventually, distance will move to compiled C code, but that is
> immediately below "Learn C" on the ever lengthening TODO list ;-)
> 
> > 
> > and library (cluster), function ?agnes
> 
> I think you mean daisy() here. agnes() is for /clustering/.

I meant to say here:

You pass agnes() (or other clustering function that takes a
dissimilarity matrix) the output from daisy(). agnes() itself can't do
the mixed-mode dissimilarity.

?hclust is another solution in base R for doing the clustering, once you
have the dissimilarity matrix.

G

> 
> G
> 
> > 
> > B.
> > 
> > 
> > Chua Siang Li wrote:
> > > 
> > > 
> > >    Hello there.  Is there any function in R that can do cluster on a set
> > > of
> > >    data that has both categorical and numerical variables?  thanks.
> > >    siangli
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > > 
> > > 
> > 
> > 
> > -----
> > The art of living is more like wrestling than dancing.
> > (Marcus Aurelius)
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list