[R] cluster in R

Weiwei Shi helprhelp at gmail.com
Thu Oct 19 01:07:48 CEST 2006


Dear Chris:

I have a sample like this
> dim(dd.df)
[1] 142  28

and I want to cluster rows;
first of all, I followed the examples for cluster.stats() by
d.dd <- dist(dd.df) # use Euclidean
d.4 <- cutree(hclust(d.dd), 4) # 4 clusters I tried
cluster.stats(d.dd, d.4) # gives me some results like this:

$cluster.size
[1] 133   5   2   2

$avg.silwidth
[1] 0.9857916

but when I tried to use pearson dist here, by visualization, i think 4
or 5 clusters are good for pearson dist, but it gave me a very bad
avg.silwidth

d.dd <- as.dist(cor(t(x),method="pearson")) # is This correct?
$cluster.size
[1] 86 31  6 19

$avg.silwidth
[1] -0.09543089


is there something wrong or I should not use pearson dist.

btw, what's $seperation? where can I find the detailed explanation on
the output from cluster.stats?

btw, ?cluster.stats does not work on my Mac machine.
> version
               _
platform       i386-apple-darwin8.6.1
arch           i386
os             darwin8.6.1
system         i386, darwin8.6.1
status
major          2
minor          3.1
year           2006
month          06
day            01
svn rev        38247
language       R
version.string Version 2.3.1 (2006-06-01)

thanks,

weiwei

On 10/18/06, Weiwei Shi <helprhelp at gmail.com> wrote:
> Dear Christian:
> This is really a good summary. Most of my prev experience was on
> classification instead of clustering and this is really a good start
> for me. Thank you!
>
> And also hope someone can provide more info and answers to the other questions.
>
> cheers,
>
> weiwei
>
> On 10/18/06, Christian Hennig <chrish at stats.ucl.ac.uk> wrote:
> > Dear Weiwei,
> >
> > > 1. Is there a way of evaluate the effecitives (or seperation) of
> > > clustering (rather than by visualization)?
> >
> > The function cluster.stats in package fpc computes several cluster
> > validation statistics (among them the average silhouette width).
> > Function clusterboot in the same package (recent version) assesses cluster
> > stability. There are several interfaces to clustering methods implemented
> > in R which are documented on the help page of kmeansCBI (which gives you
> > kind of an overview of available "general purpose" clustering methods in R
> > though I may have missed some).
> > There are also several methods for the visualization of separation (I
> > know that you didn't ask for that) for which the function plotcluster is
> > an interface.
> >
> > Best,
> > Christian
> >
> >
> > *** --- ***
> > Christian Hennig
> > University College London, Department of Statistical Science
> > Gower St., London WC1E 6BT, phone +44 207 679 1698
> > chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
> >
>
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the R-help mailing list