[R] convergence=0 in optim and nlminb is real?
Adelchi Azzalini
azzalini at stat.unipd.it
Tue Dec 17 22:54:03 CET 2013
It was not my suggestion that an optimizer should check the Hessian on
every occasion (this would be both time consuming and meaningless),
but I expected it to do so before claiming that a point is at a
minimum, that is, only for the candidate final point.
Neither I have ever thought that nonlinear optimization is a cursory
operation, especially when the dimensionality is not small. Exactly
for this reason I expect that an optimizer takes stringent precautions
before claiming to have completed its job successfully.
AA
On 17 Dec 2013, at 18:18, Prof J C Nash (U30A) wrote:
> As indicated, if optimizers check Hessians on every occasion, R would
> enrich all the computer manufacturers. In this case it is not too
> large
> a problem, so worth doing.
>
> However, for this problem, the Hessian is being evaluated by doing
> numerical approximations to second partial derivatives, so the Hessian
> may be almost a fiction of the analytic Hessian. I've seen plenty of
> Hessian approximations that are not positive definite, when the
> answers
> were OK.
>
> That Inf is allowed does not mean that it is recommended. R is very
> tolerant of many things that are not generally good ideas. That can be
> helpful for some computations, but still cause trouble. It seems
> that it
> is not the problem here.
>
> I did not look at all the results for this problem from optimx, but it
> appeared that several results were lower than the optim(BFGS) one. Is
> any of the optimx results acceptable? Note that optimx DOES offer to
> check the KKT conditions, and defaults to doing so unless the
> problem is
> large. That was included precisely because the optimizers generally
> avoid this very expensive computation. But given the range of results
> from the optimx answers using "all methods", I'd still want to do a
> lot
> of testing of the results.
>
> This may be a useful case to point out that nonlinear optimization is
> not a calculation that should be taken for granted. It is much less
> reliable than most users think. I rarely find ANY problem for which
> all
> the optimx methods return the same answer. You really do need to
> look at
> the answers and make sure that they are meaningful.
>
> JN
>
> On 13-12-17 11:32 AM, Adelchi Azzalini wrote:
>> On Tue, 17 Dec 2013 08:27:36 -0500, Prof J C Nash (U30A) wrote:
>>
>> PJCN> If you run all methods in package optimx, you will see results
>> PJCN> all over the western hemisphere. I suspect a problem with some
>> PJCN> nasty computational issues. Possibly the replacement of the
>> PJCN> function with Inf when any eigenvalues < 0 or nu < 0 is one
>> PJCN> source of this.
>>
>> A value Inf is allowed, as indicated in this passage from the
>> documentation of optim:
>>
>> Function fn can return NA or Inf if the function cannot be evaluated
>> at the supplied value, but the initial value must have a computable
>> finite value of fn.
>>
>> Incidentally, the documentation of optimx includes the same sentence.
>>
>> However, this aspect is not crucial anyway, since the point
>> selected by
>> optim is within the feasible space (by a good margin), and
>> evaluation of
>> the Hessian matrix occurs at this point.
>>
>> PJCN>
>> PJCN> Note that Hessian eigenvalues are not used to determine
>> PJCN> convergence in optimization methods. If they did, nobody would
>> PJCN> ever get promoted from junior lecturer who was under 100 if
>> they
>> PJCN> needed to do this, because determining the Hessian from just
>> the
>> PJCN> function requires two levels of approximate derivatives.
>>
>> At the end of the optimization process, when a point is going to be
>> declared a minimum point, I expect that an optimizer checks that it
>> really *is* a minimum. It may do this in other ways other than
>> computing the eigenvalues, but it must be done somehow. Actually, I
>> first realized the problem by attempting inversion (to get standard
>> errors) under the assumption of positive definiteness, and it failed.
>> For instance
>>
>> mnormt:::pd.solve(opt$hessian)
>>
>> says "x appears to be not positive definite". This check does not
>> involve a further level of approximation.
>>
>> PJCN>
>> PJCN> If you want to get this problem reliably solved, I think you
>> will
>> PJCN> need to
>> PJCN> 1) sort out a way to avoid the Inf values -- can you constrain
>> PJCN> the parameters away from such areas, or at least not use Inf.
>> PJCN> This messes up the gradient computation and hence the
>> optimizers
>> PJCN> and also the final Hessian.
>> PJCN> 2) work out an analytic gradient function.
>> PJCN>
>>
>> In my ealier message, I have indicated that this is a semplified
>> version of the real thing, which is function mst.mle of pkg 'sn'.
>> What mst.mle does is exactly what you indicated, that is, it
>> re-parameterizes the problem so that we always stay within the
>> feasible region and works with analytic gradient function (of the
>> transformed parameters). The final outcome is the same: we land on
>> the same point.
>>
>> However, once the (supposed) point of minimum has been found, the
>> Hessian matrix must be computed on the original parameterization,
>> to get standard errors.
>>
>> Adelchi Azzalini
>>
>> PJCN>
>> PJCN>
>> PJCN> > Date: Mon, 16 Dec 2013 16:09:46 +0100
>> PJCN> > From: Adelchi Azzalini <azzalini at stat.unipd.it>
>> PJCN> > To: r-help at r-project.org
>> PJCN> > Subject: [R] convergence=0 in optim and nlminb is real?
>> PJCN> > Message-ID:
>> PJCN> > <20131216160946.91858ff279db26bd65e187bc at stat.unipd.it>
>> PJCN> > Content-Type: text/plain; charset=US-ASCII
>> PJCN> >
>> PJCN> > It must be the case that this issue has already been rised
>> PJCN> > before, but I did not manage to find it in past posting.
>> PJCN> >
>> PJCN> > In some cases, optim() and nlminb() declare a successful
>> PJCN> > convergence, but the corresponding Hessian is not
>> PJCN> > positive-definite. A simplified version of the original
>> PJCN> > problem is given in the code which for readability is placed
>> PJCN> > below this text. The example is built making use of package
>> PJCN> > 'sn', but this is only required to set-up the example: the
>> PJCN> > question is about the outcome of the optimizers. At the end
>> of
>> PJCN> > the run, a certain point is declared to correspont to a
>> minimum
>> PJCN> > since 'convergence=0' is reported, but the eigenvalues of the
>> PJCN> > (numerically evaluated) Hessian matrix at that point are not
>> PJCN> > all positive.
>> PJCN> >
>> PJCN> > Any views on the cause of the problem? (i) the point does not
>> PJCN> > correspong to a real minimum, (ii) it does dive a minimum but
>> PJCN> > the Hessian matrix is wrong, (iii) the eigenvalues are not
>> PJCN> > right. ...and, in case, how to get the real solution.
>> PJCN> >
>> PJCN> >
>> PJCN> > Adelchi Azzalini
>> PJCN>
>>
>
More information about the R-help
mailing list