[R] function 'silhouette' in package 'cluster'
Martin Maechler
maechler at stat.math.ethz.ch
Fri Feb 7 14:19:02 CET 2003
>>>>> "Laurent" == Laurent Gautier <laurent at cbs.dtu.dk>
>>>>> on Thu, 6 Feb 2003 09:37:27 +0100 writes:
Laurent> Dear all,
Laurent> I am trying (without much success) to use the fuction 'silhouette'.
Laurent> Would anyone encountered that before (or would know where I am wrong ?)
Lauren has sent me his data in the mean time.
He hit a bug: I see that the silhouette.default()
method does not work when you have only two clusters.
{Note that this is quite new function/method anyway, so that bug
only bites those who do want the new functionality!}
For the interested ones: The reason is a subtle version of the
famous ``forgotten drop=FALSE'' :
Current silhouette.default has a line
diC <- apply(dmatrix[!iC, iC], 2, function(r) tapply(r, x[!iC], mean))
which usually returns a (k-1)x Nj matrix (k = #{clusters}),
but for the case k=2 returns a vector instead of a 1-row matrix.
The solution is to wrap the RHS into an rbind(.).
Since, from Laurent's example, I found that I could improve the
function further (by allowing a direct `dmatrix' argument
alternatively to `dist'), I post here the new version of the
silhouette.default method. This will be in the next version of
the cluster package, but that will appear only by the end of
February or so.
Regards,
Martin
silhouette.default <- function(x, dist, dmatrix, ...) {
cll <- match.call()
if(!is.null(cl <- x$clustering)) x <- cl
n <- length(x)
if(!all(x == round(x))) stop("`x' must only have integer codes")
k <- length(clid <- sort(unique(x)))
if(k <= 1 || k >= n)
return(NA)
## check dist/dmatrix
if(missing(dist)) {
if(missing(dmatrix))
stop("Need either a dissimilarity `dist' or diss.matrix `dmatrix'")
if(is.null(dm <- dim(dmatrix)) || length(dm) != 2 || !all(n == dm))
stop("`dmatrix' is not a dissimilarity matrix compatible to `x'")
} else { # `dist'
dist <- as.dist(dist) # hopefully
if(n != attr(dist, "Size"))
stop("clustering `x' and dissimilarity `dist' are incompatible")
dmatrix <- as.matrix(dist)# so we can apply(.) below
}
wds <- matrix(NA, n,3, dimnames =
list(names(x), c("cluster","neighbor","sil_width")))
for(j in 1:k) { # j-th cluster:
Nj <- sum(iC <- x == clid[j])
wds[iC, 1] <- j
a.i <- if(Nj > 1) colSums(dmatrix[iC, iC])/(Nj - 1) else 0 # length(a.i)= Nj
## minimal distances to points in all other clusters:
diC <- rbind(apply(dmatrix[!iC, iC], 2,
function(r) tapply(r, x[!iC], mean)))# (k-1) x Nj
wds[iC,"neighbor"] <- clid[-j][minC <- max.col(-t(diC))]
b.i <- diC[cbind(minC, seq(minC))]
s.i <- (b.i - a.i) / pmax(b.i, a.i)
wds[iC,"sil_width"] <- s.i
}
attr(wds, "Ordered") <- FALSE
attr(wds, "call") <- cll
class(wds) <- "silhouette"
wds
}
Laurent> Please find below the R ouput. Thanks in advance,
Laurent> L.
>> s <- silhouette(ct, as.dist(metric))
Laurent> Error in "[<-"(*tmp*, iC, "sil_width", value = s.i) :
Laurent> number of items to replace is not a multiple of replacement length
Laurent> In addition: Warning messages:
Laurent> 1: longer object length
Laurent> is not a multiple of shorter object length in: b.i - a.i
Laurent> 2: number of rows of result
Laurent> is not a multiple of vector length (arg 2) in: cbind(mmm, as.vector(each))
>>
>> traceback()
Laurent> 2: silhouette.default(ct, as.dist(metric))
Laurent> 1: silhouette(ct, as.dist(metric))
>> str(ct)
Laurent> Named int [1:2381] 1 1 1 1 1 2 2 2 1 1 ...
Laurent> - attr(*, "names")= chr [1:2381] "5153" "22" "5185" "356" ...
>> str(metric)
Laurent> num [1:2381, 1:2381] 0.000 1.438 1.172 0.751 0.432 ...
Laurent> - attr(*, "dimnames")=List of 2
Laurent> ..$ : chr [1:2381] "5153" "22" "5185" "356" ...
Laurent> ..$ : chr [1:2381] "5153" "22" "5185" "356" ...
More information about the R-help
mailing list