[R] MLE for bimodal distribution

Thu Apr 9 03:07:39 CEST 2009

Dear Ted,

Thanks for your comments on the profilie-likelihood approach for ratio of sigmas.

I would like to look at your R code and the note on Aitken acceleration. I would appreciate if you could share this with me.

I am glad you nibbled at my "bait" on EM acceleration.  This is one of my favourite topics.  EM is indeed slow, but it pretty much guarantees convergence to a local maximum.  Acceleration methods, on the other hand, do not have this guarantee, as they forsake the ascent property.  This trade-off between rate of convergence and monotonicity is the crux of the acceleration problem.  I recently wrote a paper on this (Scand J Stats, June  2008).  

I have developed a class of acceleration methods, called SQUAREM, which strikes a good balance between speed and stability.  They are monotone, yet fast.  They will not be as fast as unbridled Aitken acceleration, which are susceptible to numerical instabilities.  SQUAREM, on the other hand,  is guarenteed to converge like the original EM algorithm, and will provide significant improvements in speed.  In other words, you can have your cake and eat it too!

I have written an R function to implement this.  I am planning to release an R package soon (as soon as I can free-up some time).  This package can be used to accelerate "any" EM algorithm.  The user is only required to supply the EM update function, i.e. the function that computes a single E and M step.  This function can be used as an "off-the-shelf" accelerator without needing any problem-specific input.  

Best,
Ravi.

____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu

----- Original Message -----
From: Ted.Harding at manchester.ac.uk (Ted Harding)
Date: Wednesday, April 8, 2009 7:43 pm
Subject: Re: [R] MLE for bimodal distribution
To: r-help at stat.math.ethz.ch

> On 08-Apr-09 22:10:26, Ravi Varadhan wrote:
>  > EM algorithm is a better approach for maximum likelihood estimation
>  > of finite-mixture models than direct maximization of the mixture
>  > log-likelihood.  Due to its ascent properties, it is guaranteed to
>  > converge to a local maximum.  By theoretical construction, it also
>  > automatically ensures that all the parameter constraints are satisfied.
>  > 
>  > The problem with mixture estimation, however, is that the local maximum
>  > may not be a good solution.  Even global maximum may not be a good
>  > solution.  For example, it is well known that in a Gaussian mixture,
>  > you can get a log-likelihood of +infinity by letting mu = one of the
>  > data points, and sigma = 0 for one of the mixture components.  You 
> can
>  > detect trouble in MLE if you observe one of the following: (1) a
>  > degenerate solution, i.e. one of the mixing proportions is close to 
> 0,
>  > (2) one of the sigmas is too small.
>  > 
>  > EM convergence is sensitive to starting values.  Although, there are
>  > several starting strategies (see MacLachlan and Basford's book on
>  > Finite Mixtures), there is no universally sound strategy for 
> ensuring a
>  > good estimate (e.g. global maximum, when it makes sense). It is always
>  > a good practice to run EM with multiple starting values.  Be warned
>  > that EM convergence can be excruciatingly slow.  Acceleration methods
>  > can be of help in large simulation studies or for bootstrapping.
>  > 
>  > Best,
>  > Ravi.
>  
>  Well put! I've done a fair bit of mixture-fitting in my time,
>  and one approach which can be worth trying (and for which there
>  is often a reasonably objective justification) is to do a series
>  of fits (assuming a given number of components, e.g. 2 for the
>  present purpose) with the sigma's in a constant ratio in each one:
>  
>    sigma2 = lambda*sigma1
>  
>  Then you won't get those singularities where it tries to climb
>  up a single data-point. If you do this for a range of values of
>  lambda, and keep track of the log-likelihood, then you get in
>  effect a profile likelihood for lambda. Once you see what that
>  looks like, you can then set about penalising ranges you don't
>  want to see.
>  
>  The reason I say "there is often a reasonably objective justification"
>  is that often you can be pretty sure, if there is a mixture present,
>  that lambda does not go outside say 1/10 - 10 (or whatever),
>  since you expect that the latent groups are fairly similar,
>  and unlikely to have markedly different sigma's.
>  
>  As to acceleration: agreed, EM can be slow. Aitken acceleration
>  can be dramatically faster. Several outlines of the Aitken procedure
>  can be found by googling on "aitken acceleration".
>  
>  I recently wrote a short note, describing its general principle
>  and outlining its application to the EM algorithm, using the Probit
>  model as illustration (with R code). For fitting the location
>  parameter alone, Raw EM took 59 iterations, Aitken-accelerated EM
>  took 3. For fitting the location and scale paramaters, Raw EM took 108,
>  and Aitken took 4.
>  
>  If anyone would like a copy (PS or PDF) of this, drop me a line.
>  Or is there some "repository" for R-help (like the "Files" area
>  in Google Groups) where one can place files for others to download?
>  
>  [And, if not, do people think it might be a good idea?]
>  
>  Best wishes tgo all,
>  Ted.
>  
>  --------------------------------------------------------------------
>  E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
>  Fax-to-email: +44 (0)870 094 0861
>  Date: 09-Apr-09                                       Time: 00:33:02
>  ------------------------------ XFMail ------------------------------
>  
>  ______________________________________________
>  R-help at r-project.org mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.