[R] How to improve the "OPTIM" results

Sun Apr 6 00:17:08 CEST 2008

Dear Katheryn: 

      I'm confused by your claim that, "Even though I used the true 
parameter values as initial values, the results are not very good." 

      When I ran it, 'optim' quit with $value = -35835, substantially 
less than fr2(theta0) = -0.3. 

      Could you please review your question, because I believe this will 
answer it. 


MORE GENERAL OPTIM ISSUES

      It is known that 'optim' has problems.  Perhaps the simplest thing 
to do is to call 'optim' with each of the 'methods' in sequence, using 
the 'optim' found by each 'method' as the starting value for the next.  
When I do this, I often skip 'SANN', because it typically takes so much 
more time than the other methods.  However, if there might be multiple 
local minima, then SANN may be the best way to find a global minimum, 
though you may want to call 'optim' again with another method, starting 
from optimal solution returned by 'SANN'. 

      Another relatively simple thing to do is to call optim with 
hessian = TRUE and then compute eigen(optim(..., hessian=TRUE)$hessian, 
symmetric=TRUE).  If optim found an honest local minimum, all the 
eigenvalues will be positive.  For your example, the eigenvalues were as 
follows: 

19000, 11000,
     0.06, 0.008, 0.003, 
     -0.0002, -0.06,
-6000

      This says that locally, the "minimum" identified was really a 
saddle point, being a parabola opening up in two dimensions (with 
curvatures of 19000 and 11000 respectively), a parabola opening down in 
one dimension (with curvature -6000), and relatively flat by comparison 
in the other 5 dimensions.  In cases like this, it should be moderately 
easy and fast to explore the dimension with the largest negative 
eigenvalue with the other dimensions fixed until local minima in each 
direction were found.  Then 'optim' could be restarted from both those 
local minima, and the minimum of those two solutions could be returned. 

      If all eigenvalues were positive, it might then be wise to restart 
with parscale = the square roots of the diagonal of the hessian;  I 
haven't tried this, but I believe it should work.  Using 'parscale' can 
fix convergence problems -- or create some where they did not exist 
initially. 

      I'm considering creating a package 'optimMLE' that would automate 
some of this and package it with common 'methods' that would assume that 
sum(fn(...)) was either a log(likelihood) or the negative of a 
log(likelihood), etc.  However, before I do, I need to make more 
progress on some of my other commitments, review RSiteSearch("optim", 
"fun") to make sure I'm not duplicating something that already exists, 
etc.  If anyone is interested in collaborating on this, please contact 
me off-line. 

      Hope this helps. 
      Spencer
     
kathie wrote:
> Dear R users,
>
> I used to "OPTIM" to minimize the obj. function below.  Even though I used
> the true parameter values as initial values, the results are not very good.
>
> How could I improve my results?  Any suggestion will be greatly appreciated.
>
> Regards,
>
> Kathryn Lord
>
> #------------------------------------------------------------------------------------------
>
> x = c(0.35938587,  0.34889725,  0.20577608,  0.15298888, -4.24741146,
> -1.32943326,
>         0.29082527, -2.38156942, -6.62584473, -0.22596237,  6.97893687, 
> 0.22001081,
>        -1.39729222, -5.17860124,  0.52456484,  0.46791660,  0.03065136, 
> 0.32155177,
>        -1.12530013,  0.02608057, -0.22323154,  3.30955460,  0.54653982, 
> 0.29403011,
>         0.47769874,  2.42216260,  3.93518355,  0.23237890,  0.09647044,
> -0.48768933,
>         0.37736377,  0.43739341, -0.02416010,  4.02788119,  0.07320802,
> -0.29393054,
>         0.25184609,  0.76044448, -3.34121918,  1.16028677, -0.60352008,
> -2.86454069,
>        -0.84411691,  0.24841071, -0.11764954,  5.92662106,  1.03932953,
> -6.21987657,
>        -0.54763352,  0.20263192)                         # data
>
> theta0= c(log(2),0,c(0,-.3,.3),log(c(10,.05,.05)))  # initial value(In fact,
> true parameter value)
> n = length(x)
>
> fr2 = function(theta)
> {	
> 	a1 = theta[1]; a2 = theta[2]
> 	mu1 = theta[3]; mu2 = theta[4]; mu3 = theta[5]
> 	g1 = theta[6]; g2 = theta[7]; g3 = theta[8]
> 	
> 	w1=exp(a1)/(1+exp(a1)+exp(a2))
> 	w2=exp(a2)/(1+exp(a1)+exp(a2))
> 	w3=1-w1-w2
>
> 	obj =((w1^2)/(2*sqrt(exp(g1)*pi)) 
>          +  (w2^2)/(2*sqrt(exp(g2)*pi))
>          +  (w3^2)/(2*sqrt(exp(g2)*pi))
>
> 	 +  2*w1*w2*dnorm((mu1-mu2)/sqrt(exp(g1)+exp(g2)))/sqrt(exp(g1)+exp(g2))
>          + 
> 2*w1*w3*dnorm((mu1-mu3)/sqrt(exp(g1)+exp(g3)))/sqrt(exp(g1)+exp(g3))
>          + 
> 2*w2*w3*dnorm((mu2-mu3)/sqrt(exp(g2)+exp(g3)))/sqrt(exp(g2)+exp(g3))
>
>          -  (2/n)*(sum(w1*dnorm((x-mu1)/sqrt(exp(g1)))/sqrt(exp(g1)) 
>                      + w2*dnorm((x-mu2)/sqrt(exp(g2)))/sqrt(exp(g2))
>                      + w3*dnorm((x-mu3)/sqrt(exp(g3)))/sqrt(exp(g3)) )))
>
> 	return(obj)
> }
>
> optim(theta0, fr2, method=BFGS", control = list (fnscale=10,
> parscale=c(.01,.01,.01,.01,.01,1,1,1), maxit=100000))
>
>