[R] A Few MCLUST Questions
Murray Jorgensen
maj at stats.waikato.ac.nz
Mon Jun 14 04:49:15 CEST 2004
I can answer for MCLUST specifically, but in general mixture modelling
terms it is easier to think of a reasonable initial clustering of the
data from which the M step will quickly produce initial parameter
estimates, than to pick a large number of initial parameters values out
of the air. (Perhaps you may use a random grouping to start things off
if nothing else comes to mind.) Usually if you try to do this you will
pick parameters that make some data values very improbable leading to
numerical difficulties in the M-step.
On the other hand you may have a good set of parameter values from a
previously-fitted data set and you have a new, but similar set of data,
perhaps from a different time-period or location. Then it will make
sense to start off from the parameter values that you have.
Don't worry about the software - it should be just as easy for it to
begin at either the E- or the M- step - it is you own intentions and
convenience that matter.
Murray Jorgensen
KKThird at Yahoo.Com wrote:
> Hello everyone. I have a few MCLUST questions and I was hoping someone could help me out. If you’re an MCLUST user, they will likely be pretty easy to answer. Thanks in advance for any help.
>
> Ken
>
>
>
> What are the pros/cons of starting a finite mixture model at the "m" step versus the "e" step (where "m" is the maximization step and "e" is the expectation step of the EM algorithm)? In particular, are there any reasons for using em(modelName=XXX) versus me(modelName=XXX). Other than MCLUST, I’ve not seen a finite mixture model "program" give such an option. Would it make sense to fit both models and take the one with the largest log likelihood?
>
>
>
> Rather than the hc() function performing cluster analysis for all of G possible clusters, can it be set to only perform a specified number (e.g., set so G=2 only). Although a minimum number of clusters can be specified, there doesn’t seem to be any way to limit the number of clusters. I want to do a simulation for a fixed number of components, and thus I would like to avoid the unnecessary computations.
>
>
>
> Is there any difference between hc(modelName=VVV) and hcVVV or hc(modelName=EEE) and hcEEE, etc.? Likewise, are there any differences between mstep(modelName=VVV) and mstepVVV or mstep(modelName=EEE) and mstepEEE, etc. If not, why do the same functions have different names?
>
>
>
> ---------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
--
Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz Fax 7 838 4155
Phone +64 7 838 4773 wk +64 7 849 6486 home Mobile 021 1395 862
More information about the R-help
mailing list