[R] Cluster analysis, defining center seeds or number of clusters
amvds at xs4all.nl
amvds at xs4all.nl
Thu Jun 11 17:14:50 CEST 2009
I use kmeans to classify spectral events in high and low 1/3 octave bands:
#Do cluster analysis
CyclA<-data.frame(LlowA,LhghA)
CntrA<-matrix(c(0.9,0.8,0.8,0.75,0.65,0.65), nrow = 3, ncol=2, byrow=TRUE)
ClstA<-kmeans(CyclA,centers=CntrA,nstart=50,algorithm="MacQueen")
This works well when the actual data shows 1,2 or 3 groups that are not
"too close" in a cross plot. The MacQueen algorithm will give one or more
empty groups which is what I want.
However, there are cases when the groups are closer together, less compact
or diffuse which leads to the situation where visually only 2 groups are
apparent but the algorithm returns 3 splitting one group in two.
I looked at the package 'cluster' specifically at clara (cannot use pam as
I have 10000 observations). But clara always returns as many groups as you
aks for.
Is there a way to help find a seed for the intial cluster centers?
Equivalently, is there a way to find a priori the number of groups?
I know this is not an easy problem. I have looked at principal components
(princomp, prcomp) because there is a connection with cluster analysis. It
is not obvious to me how to program that connection though.
http://en.wikipedia.org/wiki/Principal_Component_Analysis
http://ranger.uta.edu/~chqding/papers/Zha-Kmeans.pdf
http://ranger.uta.edu/~chqding/papers/KmeansPCA1.pdf
Thanks in advance,
Alex van der Spek
More information about the R-help
mailing list