[R] EM unsupervised clustering

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.be
Wed Jul 18 15:48:07 CEST 2007


you could also have a look at function lca() from package `e1071' that 
performs a latent class analysis, e.g.,

fit1 <- lca(data, 2)
fit1

fit2 <- lca(data, 3)
fit2

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm



----- Original Message ----- 
From: "Federico Calboli" <f.calboli at imperial.ac.uk>
To: "r-help" <r-help at stat.math.ethz.ch>
Sent: Wednesday, July 18, 2007 3:37 PM
Subject: [R] EM unsupervised clustering


> Hi All,
>
> I have a  n x m matrix. The n rows are individuals, the m columns 
> are variables.
>
> The matrix is in itself a collection of 1s (if a variable is 
> observed for an
> individual), and 0s (is there is no observation).
>
> Something like:
>
>      [,1] [,2] [,3] [,4] [,5] [,6]
> [1,]    1    0    1    1    0    0
> [2,]    1    0    1    1    0    0
> [3,]    1    0    1    1    0    0
> [4,]    0    1    0    0    0    0
> [5,]    1    0    1    1    0    0
> [6,]    0    1    0    0    1    0
>
>
> I use kmeans to find 2 or 3 clusters in this matrix
>
> k2 = kmeans(data, 2, 10000000)
> k3 = kmeans(data, 3, 10000000)
>
> but I would like to use something a bit more refined, so I though 
> about a EM
> based clustering. I am using the Mclust() function from the mclust 
> package, but
> I get the following (to me incomprehensible) error message:
>
> plot(Mclust(as.data.frame(data)), as.data.frame(data))
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Error in 1:L : NA/NaN argument
> In addition: Warning messages:
> 1: best model occurs at the min or max # of components considered 
> in:
> summary.mclustBIC(Bic, data, G = G, modelNames = modelNames)
> 2: optimal number of clusters occurs at min choice in:
> Mclust(as.data.frame(anc.st.mat))
> 3: insufficient input for specified plot in: coordProj(data = data, 
> parameters =
> x$parameters, z = x$z, what = "classification",
>
> That's puzzling because the example given by ?Mclust is something 
> like
>
> plot(Mclust(iris[,-5]), iris[,-5])
>
> which is pretty simple and dumbproof and works flawlessly...
>
> best,
>
> Federico
>
> -- 
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St Mary's Campus
> Norfolk Place, London W2 1PG
>
> Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193
>
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



More information about the R-help mailing list