[R] Understanding nonlinear optimization and Rosenbrock's banana valley function?

Mon Dec 5 01:42:59 CET 2005

Henry Wolkowicz (google his page for lots of optimization references)
mentioned to me that that function is a standard example to show
that first order methods (e.g. steepest descent) can fail by repeatedly
crossing back and forth over the valley whereas second order methods
(damped Newton, damped quasi-Newton, trust region) use curvature
info to get convergence in a few iterations.

On 12/4/05, Spencer Graves <spencer.graves at pdf.com> wrote:
> GENERAL REFERENCE ON NONLINEAR OPTIMIZATION?
>
>          What are your favorite references on nonlinear optimization?  I like
> Bates and Watts (1988) Nonlinear Regression Analysis and Its
> Applications (Wiley), especially for its key insights regarding
> parameter effects vs. intrinsic curvature.  Before I spent time and
> money on several of the refences cited on the help pages for "optim",
> "nlm", etc., I thought I'd ask you all for your thoughts.
>
> ROSENBROCK'S BANANA VALLEY FUNCTION?
>
>          Beyond this, I wonder if someone help me understand the lessons one
> should take from Rosenbrock's banana valley function:
>
> banana <- function(x){
>   100*(x[2]-x[1]^2)^2+(1-x[1])^2
> }
>
>          This a quartic x[1] and a parabola in x[2] with a unique minimum at
> x[2]=x[1]=1.  Over the range (-1, 2)x(-1,1), it looks like a long,
> curved, deep, narrow banana-shaped valley.  It is a known hard problem
> in nonlinear regression, but these difficulties don't affect "nlm" or
> "nlminb" until the hessian is provided analytically (with R 2.2.0 under
> Windows XP):
>
> nlm(banana, c(-1.2, 1)) # found the minimum in 23 iterations
> nlminb(c(-1.2, 1), banana)# found the min in 35 iterations
>
> Dbanana <- function(x){
>   c(-400*x[1]*(x[2] - x[1]^2) - 2*(1-x[1]),
>     200*(x[2] - x[1]^2))
> }
> banana1 <- function(x){
>   b <- 100*(x[2]-x[1]^2)^2+(1-x[1])^2
>   attr(b, "gradient") <- Dbanana(x)
>   b
> }
>
> nlm(banana1, c(-1.2, 1)) # solved the problem in 24 iterations
> nlminb(c(-1.2, 1), banana, Dbanana)# solution in 35 iterations
>
> D2banana <- function(x){
>         a11 <- (2 - 400*(x[2] - x[1]^2) + 800*x[2]*x[1]^2)
>         a21 <- (-400*x[1])
>         matrix(c(a11,a21,a21,200),2,2)
> }
> banana2 <- function(x){
>   b <- 100*(x[2]-x[1]^2)^2+(1-x[1])^2
>   attr(b, "gradient") <- Dbanana(x)
>   attr(b, "hessian") <- D2banana(x)
>   b
> }
>
> nlm(banana2, c(-1.2, 1))
> # Found the valley but not the minimum
> # in the default 100 iterations.
> nlm(banana2, c(-1.2, 1), iterlim=10000)
> # found the minimum to 3 significant digits in 5017 iterations.
>
> nlminb(c(-1.2, 1), banana, Dbanana, D2banana)
> # took 95 iterations to find the answer to double precision.
>
>          To understand this better, I wrote my own version of "nlm" (see
> below), and learned that the hessian is often indefinite, with one
> eigenvalue positive and the other negative.  If I understand correctly,
> a negative eigenvalue of the hessian tends to push the next step towards
> increasing rather than decreasing the function.  I tried a few things
> that accelerated the convergence slightly, but but my "nlm." still had
> not converged after 100 iterations.
>
>          What might be done to improve the performance of something like "nlm"
> without substantially increasing the overhead for other problems?
>
>          Thanks.
>          spencer graves
> #############################
> nlm. <- function(f=fgh, p=c(-1.2, 1),
>   gradtol=1e-6, steptol=1e-6, iterlim=100){
> # R code version of "nlm"
> # requiring analytic gradient and hessian
> #
> # Initial evaluation
>   f.i <- f(p)
>   f0 <- f.i+1
> # Iterate
>   for(i in 1:iterlim){
>     df <- attr(f.i, "gradient")
> #   Gradient sufficiently small?
>     if(sum(df^2)<(gradtol^2)){
>       return(list(minimum=f.i, estimate=p+dp,
>           gradient=df, hessian=d2f, code=1,
>           iterations=i))
>     }
> #
>     d2f <- attr(f.i, "hessian")
>     dp <- (-solve(d2f, df))
> #   Step sufficiently small?
>     if(sum(dp^2)<(steptol^2)){
>       return(list(minimum=f.i, estimate=p+dp,
>           gradient=df, hessian=d2f, code=2,
>           iterations=i))
>     }
> #   Next iter
>     f0 <- f.i
>     f.i <- f(p+dp)
> #   Step size control
>     if(f.i>=f0){
>       for(j in 1:iterlim){
>         {
>           if(j==1){
>             d2f.eig <- eigen(d2f, symmetric=T)
>             cat("\nstep size control; i=", i,
>                 "; p=", round(p, 3), "; dp=", signif(dp, 2),
>                 "; eig(hessian)=",signif(d2f.eig$values, 4))
>             v.max <- (1+max(abs(d2f.eig$values)))
>             v.adj <- pmax(.001*v.max, abs(d2f.eig$values))
>             evec.df <- (t(d2f.eig$vectors) %*% df)
>             dp <- (-(d2f.eig$vectors %*%
>                      (evec.df/(1+v.adj))))
>           }
>           else{
>             cat(".")
>             dp <- dp/2
>           }
>         }
>         f.i <- f(p+dp)
>         f2 <- f(p+dp/2)
>         if(f2<f.i){
>           dp <- dp/2
>           f.i <- f2
>         }
>         if(f.i<f0)break # j
>       }
>       if(f.i>=f0){
>         cat("\n")
>         return(list(minimum=f0, estimate=p,
>            gradient=attr(f0, "gradient"),
>            hessian=attr(f0, "hessian"), code=3,
>            iterations=i))
>       }
>     }
>     p <- p+dp
>     cat(i, p, f.i, "\n")
>   }
>   return(list(minimum=f.i, estimate=p,
>       gradient=df, hessian=d2f, code=4,
>       iterations=i))
> }
>
> --
> Spencer Graves, PhD
> Senior Development Engineer
> PDF Solutions, Inc.
> 333 West San Carlos Street Suite 700
> San Jose, CA 95110, USA
>
> spencer.graves at pdf.com
> www.pdf.com <http://www.pdf.com>
> Tel:  408-938-4420
> Fax: 408-280-7915
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>