[R] An optim() mystery.

Tue Nov 15 22:51:46 CET 2005

Rolf Turner <rolf at math.unb.ca> writes:

> I have a Master's student working on a project which involves
> estimating parameters of a certain model via maximum likelihood,
> with the maximization being done via optim().
> 
> A phenomenon has occurred which I am at a loss to explain.
> 
> If we use certain pairs of starting values for optim(), it
> simply returns those values as the ``optimal'' values, although
> they are definitely not optimal.
> 
> With other starting values, optim() goes off and finds a reasonable,
> optimum, and seems to do so consistently --- i.e. it gets essentially
> the same estimates irrespective of starting values.
> 
> We have plotted the log likelihood surface and it appears smooth
> and relatively innocuous.
> 
> The phenomenon only occurs with the "L-BFGS-B"; the default
> (Nelder-Mead simplex) method, with a heavy penalty for violating
> constraints, seems to work just fine.  So we can get solutions;
> it just makes me uneasy that there's this funny going on.
> 
> Can anyone shed any light on what the problem is?  I have enclosed
> below code to reproduce the phenomenon.  It is probably advisable
> to save it in two separate parts:  The first part does some
> setting up - creating the data and defining the function to be
> optimized.  The second part consists of various calls to optim().
> Note that the code ***minimizes*** minus twice the log likelihood.
> 
> Be aware that the calls to optim() take quite a while to execute;
> like 40 seconds on my laptop.
> 
> Eternal gratitude for any words of wisdom on this matter!

The gradient calculations used by optim and friends sometimes cause
trouble if the objective function is not sufficiently smooth. Have a
look at this:

fval<-sapply(seq(-1e-6,1e-6,,100),
     function(x)foo(pars+c(0,x),res=resi,theta=theta))
plot(diff(diff(fval)),type="l") 

notice that the second order differences occasionally go negative. If
this is commensurate with the differences used for the numerical
gradient, then you have a bunch of local optima on your hands. I'm not
quite sure that they really are commensurate (and I'm not going to dig
into the optim sources just now) but if this is the cause, you could
supply your own gradient. 

(
I did some similar considerations for the IBC in 2004, you might want
to peek at my slides from then:

http://www.biostat.ku.dk/~pd/slides/cairns04-slides.pdf 
)

> BTW, the recalcitrant parameter pair, used in the script below, is
> the pair of values which were used to simulate the data ``resi''.  So
> they are the ``correct'' values.  But optim() can't know that unless
> magic is loose in the world!  Anyhow, there are other pairs, e.g.
> c(0.04,0.15),that result in the same behaviour.
> 
> 				cheers,
> 
> 					Rolf Turner
> 					rolf at math.unb.ca
> 
> P. S.  The help on optim() says the Nelder-Mead method is
> ``relatively slow''.  However for this problem optim() seems to come
> in about 10 seconds when Nelder-Mead is used, as opposed to about 40
> when "L-BFGS-B" is used and runs successfully.
> 
> 					R. T.
> 
> #==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===
> # Script #1:
> #==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===
> 
> #
> # Preliminary stuff, to set things up:
> #
> 
> resi <- c(0.0603646889,0.1363603779,0.1523925363,0.2145877518,0.1440157480,
>           0.0844998235,-0.1297248347,-0.0552557629,0.0481005640,0.0754532293,
>           0.2119027608,0.1851255809,-0.0380954844,-0.0118705151,-0.1094305971,
>          -0.0862933436,-0.0275531242,-0.0626877959,-0.0923180186,-0.2725409915,
>          -0.2733103385,-0.2806762169,-0.3713376801,-0.4138431049,-0.3037150478,
>          -0.3347483204,-0.1598387374,-0.1869063413,-0.1734677727,-0.1610389684,
>          -0.2239938894,-0.2186303230,-0.1686406340,-0.2381844523,-0.2115563556,
>          -0.3134484958,-0.2570423412,-0.1447961173,0.0321965240,0.0073261698,
>          -0.0559209305,-0.0219575807,0.0073407870,0.2049894668,0.0997092007,
>           0.2281661594,0.1474324246,0.2203087759,0.1902474412,0.2909445296,
>           0.1869771602,0.1673737821,0.0644645886,0.1107366215,0.0222997552,
>          -0.0202454896,0.0458960824,0.0082990521,0.0069920731,-0.0400939212,
>          -0.1366096522,-0.1103078421,-0.0539834505,-0.0002650046,0.0163391466,
>          -0.0713795007,0.1324350576,0.1937169442,0.3131844274,0.3325063151,
>           0.4348740892,0.4551717728,0.4049963331,0.4482280578,0.2167847952,
>           0.2168151604,0.2233674124,0.2472087180,0.1983512503,0.1823975372,
>           0.0483989112,-0.0421543344,-0.0526425804,-0.0590416757,-0.0406552389,
>           0.1728161804,0.2281930355,0.2136595821,0.0647976863,0.0781707139,
>           0.0860909779,-0.0236073305,0.0817706100,0.1411715048,0.1305741598,
>           0.2838075993,0.1642797706,0.1328162057,0.3099161357,0.1859591497)
> 
> theta <- seq(0,2*pi,length=101)[-101]
> 
> foo <- function(pars,res,theta) {
> 		eps <- sqrt(.Machine$double.eps)
> 		big <- sqrt(.Machine$double.xmax)
>                 k   <- pars[1]
>                 rho <- pars[2]
> 		# This line is needed in the Nelder-Mead case:
>                 if(k < 0 | rho < 0 | rho > 1) return(big)
>                 Sigma <- k*rho^outer(theta,theta,cdist)
>                 udv   <- svd(Sigma)
>                 d     <- udv$d
>                 if(min(d) < eps) return(big)
>                 w <- t(udv$u)%*%res
> 		# Minus twice the log likelihood:
>                 sum(log(d)) + sum(w^2/d)
>         }
> 
> cdist <- function(theta1,theta2) {
>         d <- abs(theta1-theta2)%%(2*pi)
>         ifelse(d>pi,2*pi-d,d)
> }
> 
> #==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===
> # Script #2:
> #==+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===+===
> 
> #
> # Demonstration of the problem.
> #
> 
> # Using the ``bad'' starting values:
> pars <- c(0.06,0.25)
> R1 <- optim(pars,foo,res=resi,theta=theta, method="L-BFGS-B",
>       lower=c(0,0),upper=c(Inf,1))
> print(R1)
> 
> # Using roughly (rather badly!) estimated starting values:
> pars <- c(var(resi),cor(resi[-1],resi[-100]))
> R2 <- optim(pars,foo,res=resi,theta=theta, method="L-BFGS-B",
>       lower=c(0,0),upper=c(Inf,1))
> print(R2)
> 
> # Using other starting values:
> pars <- c(0.05,0.5)
> R3 <- optim(pars,foo,res=resi,theta=theta, method="L-BFGS-B",
>       lower=c(0,0),upper=c(Inf,1))
> print(R3)
> 
> # Using Nelder-Mead and the ``bad'' starting values.
> pars <- c(0.06,0.25)
> R4 <- optim(pars,foo,res=resi,theta=theta)
> print(R4)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907