[R] Cluster on both categorical and numerical data
Gavin Simpson
gavin.simpson at ucl.ac.uk
Wed Jun 18 14:02:45 CEST 2008
On Wed, 2008-06-18 at 12:43 +0100, Gavin Simpson wrote:
> On Wed, 2008-06-18 at 03:45 -0700, Birgitle wrote:
> > You could have a look at library(analogue) , function ?distance
>
> Thanks for the plug Birgit, but (and I say this as the author of
> distance), if you just want to compute a dissimilarity matrix using
> Gower's coefficient for mixed data, use daisy() from recommended package
> cluster because i) as cluster is recommended you don't need to install
> further packages, and ii) I haven't done timings, but daisy() will be
> much faster, and potentially use less memory, than distance() because
> daisy() is in compiled FORTRAN and is doing half the computations that
> distance does, which uses a pure R approach.
>
> distance was written with a very specific use-case in mind; of
> dissimilarities between rows of matrix A and rows of matrix B. That it
> does full dissimilarity matrix computation when provided a single matrix
> is a side effect (one that I intend to keep however).
>
> Eventually, distance will move to compiled C code, but that is
> immediately below "Learn C" on the ever lengthening TODO list ;-)
>
> >
> > and library (cluster), function ?agnes
>
> I think you mean daisy() here. agnes() is for /clustering/.
I meant to say here:
You pass agnes() (or other clustering function that takes a
dissimilarity matrix) the output from daisy(). agnes() itself can't do
the mixed-mode dissimilarity.
?hclust is another solution in base R for doing the clustering, once you
have the dissimilarity matrix.
G
>
> G
>
> >
> > B.
> >
> >
> > Chua Siang Li wrote:
> > >
> > >
> > > Hello there. Is there any function in R that can do cluster on a set
> > > of
> > > data that has both categorical and numerical variables? thanks.
> > > siangli
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >
> > -----
> > The art of living is more like wrestling than dancing.
> > (Marcus Aurelius)
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list