[R] MLE for bimodal distribution
(Ted Harding)
Ted.Harding at manchester.ac.uk
Thu Apr 9 01:39:36 CEST 2009
On 08-Apr-09 22:10:26, Ravi Varadhan wrote:
> EM algorithm is a better approach for maximum likelihood estimation
> of finite-mixture models than direct maximization of the mixture
> log-likelihood. Due to its ascent properties, it is guaranteed to
> converge to a local maximum. By theoretical construction, it also
> automatically ensures that all the parameter constraints are satisfied.
>
> The problem with mixture estimation, however, is that the local maximum
> may not be a good solution. Even global maximum may not be a good
> solution. For example, it is well known that in a Gaussian mixture,
> you can get a log-likelihood of +infinity by letting mu = one of the
> data points, and sigma = 0 for one of the mixture components. You can
> detect trouble in MLE if you observe one of the following: (1) a
> degenerate solution, i.e. one of the mixing proportions is close to 0,
> (2) one of the sigmas is too small.
>
> EM convergence is sensitive to starting values. Although, there are
> several starting strategies (see MacLachlan and Basford's book on
> Finite Mixtures), there is no universally sound strategy for ensuring a
> good estimate (e.g. global maximum, when it makes sense). It is always
> a good practice to run EM with multiple starting values. Be warned
> that EM convergence can be excruciatingly slow. Acceleration methods
> can be of help in large simulation studies or for bootstrapping.
>
> Best,
> Ravi.
Well put! I've done a fair bit of mixture-fitting in my time,
and one approach which can be worth trying (and for which there
is often a reasonably objective justification) is to do a series
of fits (assuming a given number of components, e.g. 2 for the
present purpose) with the sigma's in a constant ratio in each one:
sigma2 = lambda*sigma1
Then you won't get those singularities where it tries to climb
up a single data-point. If you do this for a range of values of
lambda, and keep track of the log-likelihood, then you get in
effect a profile likelihood for lambda. Once you see what that
looks like, you can then set about penalising ranges you don't
want to see.
The reason I say "there is often a reasonably objective justification"
is that often you can be pretty sure, if there is a mixture present,
that lambda does not go outside say 1/10 - 10 (or whatever),
since you expect that the latent groups are fairly similar,
and unlikely to have markedly different sigma's.
As to acceleration: agreed, EM can be slow. Aitken acceleration
can be dramatically faster. Several outlines of the Aitken procedure
can be found by googling on "aitken acceleration".
I recently wrote a short note, describing its general principle
and outlining its application to the EM algorithm, using the Probit
model as illustration (with R code). For fitting the location
parameter alone, Raw EM took 59 iterations, Aitken-accelerated EM
took 3. For fitting the location and scale paramaters, Raw EM took 108,
and Aitken took 4.
If anyone would like a copy (PS or PDF) of this, drop me a line.
Or is there some "repository" for R-help (like the "Files" area
in Google Groups) where one can place files for others to download?
[And, if not, do people think it might be a good idea?]
Best wishes tgo all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 09-Apr-09 Time: 00:33:02
------------------------------ XFMail ------------------------------
More information about the R-help
mailing list