[R] Silhouette question

Jonck van der Kogel jonck at vanderkogel.net
Sat Jun 21 00:33:31 CEST 2003


Hi all,
I am momentarily experimenting with Silhouette from the cluster library 
but I am getting some errors. Since Silhouette can be seen as a quality 
measure for a clustering what I want to do is run a series of different 
clusterings and store the one with the highest Silhouette value. In 
that way I hope to get "the best" clustering possible for my dataset.
Here is the problem:
When running the examples that come with silhouette, everything works 
fine, the silhouette values are calculated perfectly. When I try to run 
silhouette with my own dataset I get errors at unpredictable times, 
that is, sometimes silhouette runs succesfully and at other times it 
gives me the following error:
 > test <- silhouette(cutree(agn, k=5), daisy(bestSom$codes))
Error in apply(dmatrix[!iC, iC], 2, function(r) tapply(r, x[!iC], 
mean)) :
         dim(X) must have a positive length

Since I am running my experiments in batch mode (put a loop of 
experiments in a source file and then load this source file), whenever 
this error occurs the entire experiment is cut off. The experiment 
takes rather a long time (approx. 12 hours), so I would not want to 
start my experiment at night only to find in the morning that my 
experiment never ran. Is there a way to
a) prevent the error from happening, or
b) detect beforehand that the error will happen and thus not do the 
silhouette calculation for that particular clustering

Any help with this is much appreciated,
thanks, Jonck




More information about the R-help mailing list