[R-sig-eco] loglikelihood in MclustDA

Mon Oct 14 13:29:58 CEST 2019

Dear List,

Could anyone explain me why overall  log likelihood in an MclustDA is 
not a sum of log likelihoods in the models fitted to the groups? See the 
numbers in this simple example:

library(mclust)
attach(iris)
m<-MclustDA(Sepal.Length,class=Species)

logLik(m)

m$models$ setosa$loglik
m$models$ versicolor$loglik
m$models$ virginica$loglik

I recognized that overall log likelihood is calculated by a rather 
tricky way: likelihoods of all models are calculated for all objects 
(without regarding their a priori classification), then (weighted?) 
average of these likelihoods are calculated, and the overall log 
likelihood is the sum of logarithms of these averages.

This code illustrate this way of calculation:

likelihood<-with(m$models$ 
setosa,dnorm(Sepal.Length,mean=parameters$mean,sd=sqrt(parameters$variance$sigmasq)))/3
likelihood<-likelihood+with(m$models$ 
versicolor,dnorm(Sepal.Length,mean=parameters$mean,sd=sqrt(parameters$variance$sigmasq)))/3
likelihood<-likelihood+with(m$models$ 
virginica,dnorm(Sepal.Length,mean=parameters$mean,sd=sqrt(parameters$variance$sigmasq)))/3

sum(log(likelihood))

Why this is the correct way of calculation? It also would be useful if 
you could recommend a literature that answer to my question.

Thanks!

Zoltan