[R] MLE for bimodal distribution
gunter.berton at gene.com
Wed Apr 8 22:11:48 CEST 2009
The problem is that fitting mixtures models is hard -- (non)identifiability
is a serious problem: very large sample sizes may be necessary to clearly
distinguish the modes. As V&R say in MASS, 4th edition, p. 320: " ...
fitting normal mixtures is a difficult problem, and the results obtained are
often heavily dependent on the initial configuration ([i.e. starting values.
The EM algorithm is quite commonly used: you might have a look at em() and
related functions in the mclust package if Ben's straightforward approach
fails to do the job for you. No guarantee em will work either, of course.
Genentech Nonclinical Biostatistics
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Ben Bolker
Sent: Wednesday, April 08, 2009 12:47 PM
To: r-help at r-project.org
Subject: Re: [R] MLE for bimodal distribution
> Hello everyone,
> I'm trying to use mle from package stats4 to fit a bi/multi-modal
> distribution to some data, but I have some problems with it.
> Here's what I'm doing (for a bimodal distribution):
> # Build some fake binormally distributed data, the procedure fails also
> with real data, so the problem isn't here
> data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))
> # Just to check it's bimodal
> f = function(m, s, m2, s2, w)
> -log( sum(w*dnorm(data, mean=m, sd=s)) + sum((1-w)*dnorm(data, mean=m2,
> sd=s2)) )
> res = mle(f, start=list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6))
> This gives an error:
> Error in optim(start, f, method = method, hessian = TRUE, ...) :
> non-finite finite-difference value 
> In addition: There were 50 or more warnings (use warnings() to see the
> first 50)
> And the warnings are about dnorm producing NaNs
> So, my questions are:
> 1) What does "non-finite finite-difference value" mean? My guess would be
> that an Inf was passed somewhere where a finite number was expected...?
> I'm not sure how optim works though...
> 2) Is the log likelihood function I wrote correct? Can I use the same type
> of function for 3 modes?
> 3) What should I do to avoid function failure? I tried by changing the
> parameters, but it did not work.
> 4) Can I put constraints to the parameters? In particular, I would like w
> to be between 0 and 1.
> 5) Is there a way to get the log-likelihood value, so that I can compare
> different extimations?
> 6) Do you have any (possibly a bit "pratically oriented") readings about
> MLE to suggest?
> Thanks in advance
Here's some tweaked code that works.
Read about the "L-BFGS-B" method in ?optim to constrain parameters,
although you will have some difficulty making this work for more than
two components. I think there's also a mixture model problem in
Venables & Ripley (MASS).
There are almost certainly more efficient approaches for mixture
model fitting, although this "brute force" approach should be fine
if you don't need to do anything too complicated.
data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))
# Just to check it's bimodal
f = function(m, s, m2, s2, w)
-sum(log(w*dnorm(data, mean=m, sd=s)+
(1-w)*dnorm(data, mean=m2, sd=s2)))
start0 <- list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6)
do.call("f",start0) ## OK
res = mle(f, start=start0)
View this message in context:
Sent from the R help mailing list archive at Nabble.com.
R-help at r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help