[R] MLE for bimodal distribution

Wed Apr 8 22:18:44 CEST 2009

Ben,

You can also do this nicely with the function spg() in the "BB" package

set.seed(1001)

data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))

f = function(m, s, m2, s2, w)
{
  -sum(log(w*dnorm(data, mean=m, sd=s)+
           (1-w)*dnorm(data, mean=m2, sd=s2)))
}

start0 <- list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6)

fcn <- function(x) f(x[1], x[2], x[3], x[4], x[5])  

spg(par=as.numeric(start0), fn=fcn, lower=c(-Inf, 0, -Inf, 0, 0), upper=c(Inf, Inf, Inf, Inf, 1))

# of course, "L-BFGS-B" also works well

optim(par=as.numeric(start0), fn=fcn, method="L-BFGS-B", lower=c(-Inf, 0, -Inf, 0, 0), upper=c(Inf, Inf, Inf, Inf, 1))

Ravi.
____________________________________________________________________

Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvaradhan at jhmi.edu

----- Original Message -----
From: Ben Bolker <bolker at ufl.edu>
Date: Wednesday, April 8, 2009 3:49 pm
Subject: Re: [R] MLE for bimodal distribution
To: r-help at r-project.org

>  
>  
>  _nico_ wrote:
>  > 
>  > Hello everyone,
>  > 
>  > I'm trying to use mle from package stats4 to fit a bi/multi-modal
>  > distribution to some data, but I have some problems with it.
>  > Here's what I'm doing (for a bimodal distribution):
>  > 
>  > # Build some fake binormally distributed data, the procedure fails 
> also
>  > with real data, so the problem isn't here
>  > data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))
>  > # Just to check it's bimodal
>  > plot(density(data))
>  > f = function(m, s, m2, s2, w)
>  > {
>  > -log( sum(w*dnorm(data, mean=m, sd=s)) + sum((1-w)*dnorm(data, mean=m2,
>  > sd=s2)) )
>  > }
>  > 
>  > res = mle(f, start=list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6))
>  > 
>  > This gives an error:
>  > Error in optim(start, f, method = method, hessian = TRUE, ...) : 
>  >   non-finite finite-difference value [2]
>  > In addition: There were 50 or more warnings (use warnings() to see 
> the
>  > first 50)
>  > And the warnings are about dnorm producing NaNs
>  > 
>  > So, my questions are:
>  > 1) What does "non-finite finite-difference value" mean? My guess 
> would be
>  > that an Inf was passed somewhere where a finite number was expected...?
>  > I'm not sure how optim works though...
>  > 2) Is the log likelihood function I wrote correct? Can I use the 
> same type
>  > of function for 3 modes?
>  > 3) What should I do to avoid function failure? I tried by changing 
> the
>  > parameters, but it did not work.
>  > 4) Can I put constraints to the parameters? In particular, I would 
> like w
>  > to be between 0 and 1.
>  > 5) Is there a way to get the log-likelihood value, so that I can compare
>  > different extimations?
>  > 6) Do you have any (possibly a bit "pratically oriented") readings 
> about
>  > MLE to suggest?
>  > 
>  > Thanks in advance
>  > Nico
>  > 
>  
>    Here's some tweaked code that works.
>  Read about the "L-BFGS-B" method in ?optim to constrain parameters,
>  although you will have some difficulty making this work for more than
>  two components.  I think there's also a mixture model problem in
>  Venables & Ripley (MASS).
>  
>    There are almost certainly more efficient approaches for mixture
>  model fitting, although this "brute force" approach should be fine
>  if you don't need to do anything too complicated.
>  (RSiteSearch("mixture model"))
>  
>  set.seed(1001)
>  data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))
>  # Just to check it's bimodal
>  plot(density(data))
>  f = function(m, s, m2, s2, w)
>  {
>    -sum(log(w*dnorm(data, mean=m, sd=s)+
>             (1-w)*dnorm(data, mean=m2, sd=s2)))
>  }
>  
>  library(stats4)
>  start0 <- list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6)
>  do.call("f",start0) ## OK
>  res = mle(f, start=start0)
>  
>  with(as.list(coef(res)),
>       curve(w*dnorm(x,m,s)+(1-w)*dnorm(x,m2,s2),add=TRUE,col=2))
>  
>  
>  -- 
>  View this message in context: 
>  Sent from the R help mailing list archive at Nabble.com.
>  
>  ______________________________________________
>  R-help at r-project.org mailing list
>  
>  PLEASE do read the posting guide 
>  and provide commented, minimal, self-contained, reproducible code.