[R] EM unsupervised clustering
Federico Calboli
f.calboli at imperial.ac.uk
Wed Jul 18 15:37:36 CEST 2007
Hi All,
I have a n x m matrix. The n rows are individuals, the m columns are variables.
The matrix is in itself a collection of 1s (if a variable is observed for an
individual), and 0s (is there is no observation).
Something like:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 1 1 0 0
[2,] 1 0 1 1 0 0
[3,] 1 0 1 1 0 0
[4,] 0 1 0 0 0 0
[5,] 1 0 1 1 0 0
[6,] 0 1 0 0 1 0
I use kmeans to find 2 or 3 clusters in this matrix
k2 = kmeans(data, 2, 10000000)
k3 = kmeans(data, 3, 10000000)
but I would like to use something a bit more refined, so I though about a EM
based clustering. I am using the Mclust() function from the mclust package, but
I get the following (to me incomprehensible) error message:
plot(Mclust(as.data.frame(data)), as.data.frame(data))
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Error in 1:L : NA/NaN argument
In addition: Warning messages:
1: best model occurs at the min or max # of components considered in:
summary.mclustBIC(Bic, data, G = G, modelNames = modelNames)
2: optimal number of clusters occurs at min choice in:
Mclust(as.data.frame(anc.st.mat))
3: insufficient input for specified plot in: coordProj(data = data, parameters =
x$parameters, z = x$z, what = "classification",
That's puzzling because the example given by ?Mclust is something like
plot(Mclust(iris[,-5]), iris[,-5])
which is pretty simple and dumbproof and works flawlessly...
best,
Federico
--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG
Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com
More information about the R-help
mailing list