# [R] another optimization question

John Fox jfox at mcmaster.ca
Mon Nov 26 00:20:21 CET 2001

```Dear Brian

At 08:26 AM 11/25/2001 +0000, Prof Brian D Ripley wrote:
>On Sat, 24 Nov 2001, John Fox wrote:
>
>. . .
>
> > So, my question is, is it possible in principle for an optimization to fail
> > using a correct analytic gradient but to converge with a numerical
> > gradient? If this is possible, is it a common occurrence?
>
>It's possible but rare.  You don't have a `correct analytic gradient', but
>a numerical computation of it.  Inaccurately computed gradients are a
>common cause of convergence problems. You may need to adjust the
>tolerances.
>
>It's also possible in principle that the optimizer takes a completely
>different path from the starting point due to small differences in
>calculated derivatives.  It's worth trying staritng near the expected

I didn't describe it in my original post, but I had messed around a fair
bit with the problem before posting my question. (I say "messed around"
response, I've checked over some of what I did, to make sure I that I
remember it correctly. Without describing everything in tedious detail,
here are some of my results:

First, if I start the optimization right at the solution, I get the
solution back as a result. I took this as evidence that my calculation of
the gradient is probably ok. (And, as I said, I get the correct solution to
other problems.)

If I start reasonably near the solution, optim (which I use first) reports
convergence, but doesn't quite reach the solution; nlm (which starts with
the parameter values produced by optim) reports a return code of 3, which
corresponds to "last global step failed to locate a point lower than
estimate. Either estimate is an approximate local minimum of the function
or steptol is too small." Changing the steptol (and other arguments to nlm)
doesn't seem to help, however. (I do have a question about the fscale and
typsize arguments, which default respectively to 1 and a vector of 1's: Why
are these available independent of the start values, from which they can be
inferred?)

So that you can get a more concrete sense of what's going on, here's a
table of the different solutions (with the rows corresponding to parameters):

optim        nlm(1)      nlm(2)    start(1)  start(2)
lamb   4.9017942   4.9250519   5.3688830   5.0    18.6204419
gam1  -0.5382103  -0.5912515  -0.6299493  -1.0    -0.4252919
beta   0.6141337   0.6046611   0.5931075   1.0     0.5207707
gam2  -0.1992669  -0.2189711  -0.2408609  -0.2    -0.1805793
the1   3.8618249   3.5585071   3.6077990   4.0     2.0121677
the2   4.3781542   3.6819272   3.5949141   4.0     1.5921868
the3   1.6595465   2.4249510   2.9937057   3.0     2.2103193
the4 299.7290498 299.6466428 259.5756196 300.0   103.5671443
the5   1.2506907   0.8819633   0.9057823   1.0     0.5174292
psi1   5.9471507   5.8307768   5.6705004  61.0     0.8191268
psi2   4.6063684   4.5328785   4.5149762   5.0     0.6161997
phi    9.3785360   7.1702049   6.6162702   7.0     1.0000000
-------------------------------------------------------------
obj fn 55.74172    18.41530    13.48505

Here, the solutions labelled optim and nlm(1) use the supplied expression
for the gradient (your point that this too is a numerical approximation
seems obvious once stated, but I didn't consider it previously), while the
solution labelled nlm(2) uses the default numerical derivatives; the
start(1) column gives the start values that I specified "near" to the
solution nlm(2); the start(2) column gives the start values that the
program calculates itself if start values are not supplied; and the last
row gives the values of the objective function for each solution, scaled as
a chi-square statistic with 9 df. (When the start values in start(2) are
used, the solutions produced by optim and nlm(1) are different from those
given above, but the symptoms are the same -- e.g., optim reports
convergence, nlm returns a code of 3.)

I suspect that the problem is ill-conditioned in some way, but I haven't
been able to figure out how. I guess that I should investigate further. I
could supply other potentially relevant information, such as the hessian at
the solution, but I'm reluctant to impose further on your time, or that of
other list members.

John
-----------------------------------------------------
John Fox
Department of Sociology
McMaster University
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-----------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```