[R] agglomerative coefficient in agnes (cluster)
Liaw, Andy
andy_liaw at merck.com
Thu Jan 27 04:43:52 CET 2005
> -----Original Message-----
> From: Weiguang Shi
>
> Thanks again Andy.
>
> The definition of AC is understood, yet I have trouble
> picturing the amount of "clear clustering structure"
> it measures. To put things into perspective, for two
> series
> 1,2,1000,1001
> and
> 1,2,3,1000
> agnes(x, method="single") generates ac values of
> 0.998998 and 0.0.7492477 respectively, yet it seems to
> me that both have fairly clear clustering structures.
It has to do with sample sizes. Consider the following:
testAC <- function(prop1=0.5, x=rnorm(50), center=c(0, 100), ...) {
stopifnot(require(cluster))
n <- length(x)
n1 <- ceiling(n * prop1)
n2 <- n - n1
agnes(x + rep(center, c(n1, n2)), ...)$ac
}
Now some tests:
> sapply(c(.25, .5), testAC, x=x[1:4], method="single")
[1] 0.7427591 0.9862944
> sapply(1:5 / 10, testAC, x=x[1:10], method="single")
[1] 0.8977139 0.9974224 0.9950061 0.9946366 0.9946366
> sapply(1:5 / 10, testAC, x=x, method="single")
[1] 0.9982955 0.9969757 0.9971114 0.9971127 0.9975111
So it seems like AC does not consider isolated singletons as cluster
structures. This is only discernable in small sample size, though.
Andy
> --- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> > BTW, I checked the book. You're not going find much
> > more than that.
> >
> Thanks for checking.
>
> Weiguang
>
> ______________________________________________________________
> ________
> Post your free ad now! http://personals.yahoo.ca
>
>
More information about the R-help
mailing list