[R] Silhouette question
Jonck van der Kogel
jonck at vanderkogel.net
Sat Jun 21 00:33:31 CEST 2003
Hi all,
I am momentarily experimenting with Silhouette from the cluster library
but I am getting some errors. Since Silhouette can be seen as a quality
measure for a clustering what I want to do is run a series of different
clusterings and store the one with the highest Silhouette value. In
that way I hope to get "the best" clustering possible for my dataset.
Here is the problem:
When running the examples that come with silhouette, everything works
fine, the silhouette values are calculated perfectly. When I try to run
silhouette with my own dataset I get errors at unpredictable times,
that is, sometimes silhouette runs succesfully and at other times it
gives me the following error:
> test <- silhouette(cutree(agn, k=5), daisy(bestSom$codes))
Error in apply(dmatrix[!iC, iC], 2, function(r) tapply(r, x[!iC],
mean)) :
dim(X) must have a positive length
Since I am running my experiments in batch mode (put a loop of
experiments in a source file and then load this source file), whenever
this error occurs the entire experiment is cut off. The experiment
takes rather a long time (approx. 12 hours), so I would not want to
start my experiment at night only to find in the morning that my
experiment never ran. Is there a way to
a) prevent the error from happening, or
b) detect beforehand that the error will happen and thus not do the
silhouette calculation for that particular clustering
Any help with this is much appreciated,
thanks, Jonck
More information about the R-help
mailing list