[R] cluster in R
Weiwei Shi
helprhelp at gmail.com
Thu Oct 19 02:04:59 CEST 2006
Dear Chris:
I tried to use cor+1 but it still gives me sil width < 0 for average.
> set.seed(1000)
> t9 <- cor(t(x), method="pearson")+1 # here i add 1
> t8 <- as.dist(t9)
> t7 <- cutree(hclust(t8), 4)
> cluster.stats(t8, t7)$avg.silwidth
[1] -0.008750826
> set.seed(1000)
> t9 <- cor(t(x), method="pearson") # here I did not add 1
> t8 <- as.dist(t9)
> t7 <- cutree(hclust(t8), 4)
> cluster.stats(t8, t7)$avg.silwidth
[1] -0.09543089
On 10/18/06, Weiwei Shi <helprhelp at gmail.com> wrote:
> Dear Chris:
>
> thanks for the prompt reply!
>
> You are right, dist from pearson has negatives there, which I should
> use cor+1 in my case (since negatively correlated genes should be
> considered farthest). Thanks.
>
> as to the ?cluster.stats, I double-checked it and I found I need to
> restart my JGR, until then the help page function starts to accept
> newly loaded package, like fpc for this case.
>
> sorry for the confusion,
>
> weiwei
>
> On 10/18/06, Christian Hennig <chrish at stats.ucl.ac.uk> wrote:
> > Dear Weiwei,
> >
> > > btw, ?cluster.stats does not work on my Mac machine.
> > >> version
> > > _
> > > platform i386-apple-darwin8.6.1
> > > arch i386
> > > os darwin8.6.1
> > > system i386, darwin8.6.1
> > > status
> > > major 2
> > > minor 3.1
> > > year 2006
> > > month 06
> > > day 01
> > > svn rev 38247
> > > language R
> > > version.string Version 2.3.1 (2006-06-01)
> >
> > Because I don't have access to a Mac, I can't tell you anything about
> > this, unfortunately.
> > I always thought that my package should work on all platforms if it passes
> > all the standard tests for packages?
> > (Is there anyone else who could comment on this please?)
> >
> > > I have a sample like this
> > >> dim(dd.df)
> > > [1] 142 28
> > >
> > > and I want to cluster rows;
> > > first of all, I followed the examples for cluster.stats() by
> > > d.dd <- dist(dd.df) # use Euclidean
> > > d.4 <- cutree(hclust(d.dd), 4) # 4 clusters I tried
> > > cluster.stats(d.dd, d.4) # gives me some results like this:
> > >
> > > $cluster.size
> > > [1] 133 5 2 2
> > >
> > > $avg.silwidth
> > > [1] 0.9857916
> > >
> > > but when I tried to use pearson dist here, by visualization, i think 4
> > > or 5 clusters are good for pearson dist, but it gave me a very bad
> > > avg.siqlwidth
> > >
> > > d.dd <- as.dist(cor(t(x),method="pearson")) # is This correct?
> > > $cluster.size
> > > [1] 86 31 6 19
> > >
> > > $avg.silwidth
> > > [1] -0.09543089
> >
> > cor can give negative values, which doesn't fit the usual definition
> > of a distance. I don't know what as.dist does in this case, but I think
> > that, depending on your application, you should rather use the absolute
> > value of the correlation, or 1+cor.
> >
> > > btw, what's $seperation? where can I find the detailed explanation on
> > > the output from cluster.stats?
> >
> > This is documented on the cluster.stats help page:
> >
> > separation: vector of clusterwise minimum distances of a point in the
> > cluster to a point of another cluster.
> >
> > Best regards,
> > Christian
> >
> >
> > *** --- ***
> > Christian Hennig
> > University College London, Department of Statistical Science
> > Gower St., London WC1E 6BT, phone +44 207 679 1698
> > chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
> >
>
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
More information about the R-help
mailing list