# [R] How to improve the "OPTIM" results

Spencer Graves spencer.graves at pdf.com
Sun Apr 6 00:17:08 CEST 2008

```Dear Katheryn:

I'm confused by your claim that, "Even though I used the true
parameter values as initial values, the results are not very good."

When I ran it, 'optim' quit with \$value = -35835, substantially
less than fr2(theta0) = -0.3.

MORE GENERAL OPTIM ISSUES

It is known that 'optim' has problems.  Perhaps the simplest thing
to do is to call 'optim' with each of the 'methods' in sequence, using
the 'optim' found by each 'method' as the starting value for the next.
When I do this, I often skip 'SANN', because it typically takes so much
more time than the other methods.  However, if there might be multiple
local minima, then SANN may be the best way to find a global minimum,
though you may want to call 'optim' again with another method, starting
from optimal solution returned by 'SANN'.

Another relatively simple thing to do is to call optim with
hessian = TRUE and then compute eigen(optim(..., hessian=TRUE)\$hessian,
symmetric=TRUE).  If optim found an honest local minimum, all the
eigenvalues will be positive.  For your example, the eigenvalues were as
follows:

19000, 11000,
0.06, 0.008, 0.003,
-0.0002, -0.06,
-6000

This says that locally, the "minimum" identified was really a
saddle point, being a parabola opening up in two dimensions (with
curvatures of 19000 and 11000 respectively), a parabola opening down in
one dimension (with curvature -6000), and relatively flat by comparison
in the other 5 dimensions.  In cases like this, it should be moderately
easy and fast to explore the dimension with the largest negative
eigenvalue with the other dimensions fixed until local minima in each
direction were found.  Then 'optim' could be restarted from both those
local minima, and the minimum of those two solutions could be returned.

If all eigenvalues were positive, it might then be wise to restart
with parscale = the square roots of the diagonal of the hessian;  I
haven't tried this, but I believe it should work.  Using 'parscale' can
fix convergence problems -- or create some where they did not exist
initially.

I'm considering creating a package 'optimMLE' that would automate
some of this and package it with common 'methods' that would assume that
sum(fn(...)) was either a log(likelihood) or the negative of a
log(likelihood), etc.  However, before I do, I need to make more
progress on some of my other commitments, review RSiteSearch("optim",
"fun") to make sure I'm not duplicating something that already exists,
me off-line.

Hope this helps.
Spencer

kathie wrote:
> Dear R users,
>
> I used to "OPTIM" to minimize the obj. function below.  Even though I used
> the true parameter values as initial values, the results are not very good.
>
> How could I improve my results?  Any suggestion will be greatly appreciated.
>
> Regards,
>
> Kathryn Lord
>
> #------------------------------------------------------------------------------------------
>
> x = c(0.35938587,  0.34889725,  0.20577608,  0.15298888, -4.24741146,
> -1.32943326,
>         0.29082527, -2.38156942, -6.62584473, -0.22596237,  6.97893687,
> 0.22001081,
>        -1.39729222, -5.17860124,  0.52456484,  0.46791660,  0.03065136,
> 0.32155177,
>        -1.12530013,  0.02608057, -0.22323154,  3.30955460,  0.54653982,
> 0.29403011,
>         0.47769874,  2.42216260,  3.93518355,  0.23237890,  0.09647044,
> -0.48768933,
>         0.37736377,  0.43739341, -0.02416010,  4.02788119,  0.07320802,
> -0.29393054,
>         0.25184609,  0.76044448, -3.34121918,  1.16028677, -0.60352008,
> -2.86454069,
>        -0.84411691,  0.24841071, -0.11764954,  5.92662106,  1.03932953,
> -6.21987657,
>        -0.54763352,  0.20263192)                         # data
>
> theta0= c(log(2),0,c(0,-.3,.3),log(c(10,.05,.05)))  # initial value(In fact,
> true parameter value)
> n = length(x)
>
> fr2 = function(theta)
> {
> 	a1 = theta[1]; a2 = theta[2]
> 	mu1 = theta[3]; mu2 = theta[4]; mu3 = theta[5]
> 	g1 = theta[6]; g2 = theta[7]; g3 = theta[8]
>
> 	w1=exp(a1)/(1+exp(a1)+exp(a2))
> 	w2=exp(a2)/(1+exp(a1)+exp(a2))
> 	w3=1-w1-w2
>
> 	obj =((w1^2)/(2*sqrt(exp(g1)*pi))
>          +  (w2^2)/(2*sqrt(exp(g2)*pi))
>          +  (w3^2)/(2*sqrt(exp(g2)*pi))
>
> 	 +  2*w1*w2*dnorm((mu1-mu2)/sqrt(exp(g1)+exp(g2)))/sqrt(exp(g1)+exp(g2))
>          +
> 2*w1*w3*dnorm((mu1-mu3)/sqrt(exp(g1)+exp(g3)))/sqrt(exp(g1)+exp(g3))
>          +
> 2*w2*w3*dnorm((mu2-mu3)/sqrt(exp(g2)+exp(g3)))/sqrt(exp(g2)+exp(g3))
>
>          -  (2/n)*(sum(w1*dnorm((x-mu1)/sqrt(exp(g1)))/sqrt(exp(g1))
>                      + w2*dnorm((x-mu2)/sqrt(exp(g2)))/sqrt(exp(g2))
>                      + w3*dnorm((x-mu3)/sqrt(exp(g3)))/sqrt(exp(g3)) )))
>
> 	return(obj)
> }
>
> optim(theta0, fr2, method=BFGS", control = list (fnscale=10,
> parscale=c(.01,.01,.01,.01,.01,1,1,1), maxit=100000))
>
>

```