[R] Safeguarded Newton method for function minimization

Martin Maechler maechler at stat.math.ethz.ch
Wed Apr 19 14:56:46 CEST 2017


>>>>> J C Nash <profjcnash at gmail.com>
>>>>>     on Tue, 18 Apr 2017 13:32:52 -0400 writes:

    > Recently Marie Boehnstedt reported a bug in the nlm()
    > function for function minimization when both gradient and
    > hessian are provided.
Indeed, on R's Bugzilla here :
	https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249

    > She has provided a work-around for some cases and it seems
    > this will get incorporated into the R function eventually.

indeed.... the first part -- fixing the wrong choldc() -- in the C code has
been in my version of R-devel for a while now.

See my follow up
    https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17249#c4  and #c5

including my attachment which is an extended version of Marie's original :
    https://bugs.r-project.org/bugzilla/attachment.cgi?id=2246

As that mentions _and_ shows: Fixing choldc() solves the problem
for the 2D Rosenbrook example, but _not_ the  4D Wood example.

    > However, despite the great number of packages on CRAN,
    > there does not appear to be a straightforward Newton
    > approach to function minimization. This may be because
    > providing the code for a hessian (the matrix of second
    > derivatives) is a lot of work and error-prone.

    > (R could also use some good tools for Automatic Differentiation).

The last part of what you say above is not at all true:
R -- and S (& S+ | S-PLUS) before it! -- always had deriv() and deriv3()
and my attachment above _shows_ you how to use  deriv3() to get
both the gradient and the hessian  via a version of automatic
differentiation completely effortlessly !!

For ease of readers, that part here, with an example:

##' Wood function (4 arguments 'x1' ... 'x4')
fwood <- function(x1,x2,x3,x4) {
  100*(x1^2-x2)^2 + (1-x1)^2 + 90*(x3^2-x4)^2 + (1-x3)^2 +
    10.1*((1-x2)^2 + (1-x4)^2) + 19.8*(1-x2)*(1-x4)
}
## automatically construct correct gradient and hessian:
woodf.gh <- function(x) {
  stopifnot(is.numeric(x))
  woodGH <- deriv3(body(fwood)[[2]],
                   c("x1","x2","x3","x4"), function.arg=TRUE)
  if(length(x) == 4)
    woodGH(x[1],x[2],x[3],x[4])
  else if(is.matrix(x) && ncol(x) == 4)
    woodGH(x[,1], x[,2], x[,3], x[,4])
  else stop("'x' must have length 4 or be a matrix with 4 columns")
}

and now get both the function f(x), gradient g(x) and Hessian H(x) 
for
     x = (0 0 0 0),
     x = (1 1 1 1), and
     x = (1 2 3 4)

with such a simple calle :
     
  > woodf.gh(rbind(0, 1, 1:4))
  [1]   42.0    0.0 2514.4
  attr(,"gradient")
	 x1    x2   x3     x4
  [1,]   -2 -40.0   -2  -40.0
  [2,]    0   0.0    0    0.0
  [3,] -400 279.6 5404 -819.6
  attr(,"hessian")
  , , x1

	x1   x2 x3 x4
  [1,]   2    0  0  0
  [2,] 802 -400  0  0
  [3,] 402 -400  0  0

  , , x2

	 x1    x2 x3   x4
  [1,]    0 220.2  0 19.8
  [2,] -400 220.2  0 19.8
  [3,] -400 220.2  0 19.8

  , , x3

       x1 x2   x3    x4
  [1,]  0  0    2     0
  [2,]  0  0  722  -360
  [3,]  0  0 8282 -1080

  , , x4

       x1   x2    x3    x4
  [1,]  0 19.8     0 200.2
  [2,]  0 19.8  -360 200.2
  [3,]  0 19.8 -1080 200.2

  > 

    > I have also noted that a number of researchers try to
    > implement textbook methods and run into trouble when maths
    > and computing are not quite in sync. Therefore, I wrote a
    > simple safeguarded Newton and put a small package on
    > R-forge at

    > https://r-forge.r-project.org/R/?group_id=395

    > Note that Newton's method is used to solve nonlinear
    > equations. In fact, for function minimization, we apply it
    > to solve g(x) = 0 where g is the gradient and x is the
    > vector of parameters. In part, safeguards ensure we reduce
    > the function f(x) at each step to avoid some of the
    > difficulties that may arise from a non-positive-definite
    > hessian.

    > In the package, I have a very simple quadratic test, the
    > Rosenbrock test function and the Wood test function. The
    > method fails on the last function -- the hessian is not
    > positive definite where it stops.

    > Before submitting this package to CRAN, I would like to
    > see its behaviour on other test problems, but am lazy
    > enough to wish to avoid creating the hessian code. If
    > anyone has such code, it would be very welcome. Please
    > contact me off-list. If I get some workable examples that
    > are open for public view, I'll report back here.

    > John Nash

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list