[R] cluster in R
Weiwei Shi
helprhelp at gmail.com
Thu Oct 19 01:39:22 CEST 2006
Dear Chris:
thanks for the prompt reply!
You are right, dist from pearson has negatives there, which I should
use cor+1 in my case (since negatively correlated genes should be
considered farthest). Thanks.
as to the ?cluster.stats, I double-checked it and I found I need to
restart my JGR, until then the help page function starts to accept
newly loaded package, like fpc for this case.
sorry for the confusion,
weiwei
On 10/18/06, Christian Hennig <chrish at stats.ucl.ac.uk> wrote:
> Dear Weiwei,
>
> > btw, ?cluster.stats does not work on my Mac machine.
> >> version
> > _
> > platform i386-apple-darwin8.6.1
> > arch i386
> > os darwin8.6.1
> > system i386, darwin8.6.1
> > status
> > major 2
> > minor 3.1
> > year 2006
> > month 06
> > day 01
> > svn rev 38247
> > language R
> > version.string Version 2.3.1 (2006-06-01)
>
> Because I don't have access to a Mac, I can't tell you anything about
> this, unfortunately.
> I always thought that my package should work on all platforms if it passes
> all the standard tests for packages?
> (Is there anyone else who could comment on this please?)
>
> > I have a sample like this
> >> dim(dd.df)
> > [1] 142 28
> >
> > and I want to cluster rows;
> > first of all, I followed the examples for cluster.stats() by
> > d.dd <- dist(dd.df) # use Euclidean
> > d.4 <- cutree(hclust(d.dd), 4) # 4 clusters I tried
> > cluster.stats(d.dd, d.4) # gives me some results like this:
> >
> > $cluster.size
> > [1] 133 5 2 2
> >
> > $avg.silwidth
> > [1] 0.9857916
> >
> > but when I tried to use pearson dist here, by visualization, i think 4
> > or 5 clusters are good for pearson dist, but it gave me a very bad
> > avg.siqlwidth
> >
> > d.dd <- as.dist(cor(t(x),method="pearson")) # is This correct?
> > $cluster.size
> > [1] 86 31 6 19
> >
> > $avg.silwidth
> > [1] -0.09543089
>
> cor can give negative values, which doesn't fit the usual definition
> of a distance. I don't know what as.dist does in this case, but I think
> that, depending on your application, you should rather use the absolute
> value of the correlation, or 1+cor.
>
> > btw, what's $seperation? where can I find the detailed explanation on
> > the output from cluster.stats?
>
> This is documented on the cluster.stats help page:
>
> separation: vector of clusterwise minimum distances of a point in the
> cluster to a point of another cluster.
>
> Best regards,
> Christian
>
>
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
>
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
More information about the R-help
mailing list