[R] hessian fails for box-constrained problems when close to boundary?

Sun Nov 18 04:25:44 CET 2012

Matthieu

(regarding the final question, but not the ones before that.)

The short answer is that you seem to have a solution, since hessian in 
numDeriv works. The long answer follows.

Outside a constraint area one cannot assume that a function will 
evaluate properly. Some constraints are imposed because the function is 
not defined outside the region. This would mean that the gradient and 
hessian are not defined in the usual sense at the boundary; the "left" 
and "right" derivatives should exist and be the same in the limit. I 
have had on my to-do list for some time a project to add one-sided 
approximates in numDeriv, but that will probably not happen soon. If 
your solution is actually on the boundary, you might consider some 
substitution which removes the constraint, and consider the hessian on 
the reduced, unconstrained, space.

In your situation I would also be concerned about the optimum being very 
close, but not on the boundary. You might want to check that this is not 
an artifact of the convergence criteria. Check that the objective is 
actually inferior at a point on the boundary close to your optimum. 
Also, the gradient should be zero at the optimum, but that may not be 
exactly so because of numerical convergence criteria. If the gradient is 
pointing toward the boundary, that is a sign that you may just have 
converged too soon.

Also, the hessian coming from the optimization is built up as an 
approximation using information from the steps of the optimization. I am 
not familiar enough with the numerical handling of boundary constraints 
to know how this approximation from the optimization might be affected, 
but you may need to be careful in this regard.

Paul

On 12-11-15 10:25 AM, Matthieu Stigler wrote:
> Hi
>
> I am trying to recover the hessian of a problem optimised with
> box-constraints. The problem is that in some cases, my estimates are
> very close to the boundary, which will make optim(..., hessian=TRUE) or
> optimHessian() fail, as they do not follow the box-constraints, and
> hence estimate the function in the unfeasible parameter space.
>
> As a simple example (my problem is more complex though, simple param
> transformations do not apply ashere), imagine estimating mu and sigma
> (restricted to be >0) of a simple normally distributed data, where
> however the true sigma is very close to zero:
>
> LLNorm <- function(para, dat=rn)  return(-sum(dnorm(dat, para[1],
> para[2], log=TRUE)))
> rn2 <- c(rep(10.3, 2000), 10.31)
>
>  >optim(c(10,1), fn=LLNorm, method="L-BFGS-B", lower=c(-Inf, 0.0000001),
> dat=rn2,hessian=TRUE)
> Error in optim(c(10, 1), fn = LLNorm, method = "L-BFGS-B", lower =
> c(-Inf,  : non-finite finite-difference value [2]
>
> The only solution/workaround I found is to do a two steps procedure: use
> optim() without hessian, then use optimHess, specifying the length of
> the numerical variations (arg ndeps) as smaller as the distance between
> the parameter and the bound, i.e.:
> op<-optim(c(10,1), fn=LLNorm, method="L-BFGS-B", lower=c(-Inf,
> 0.0000001), dat=rn2,hessian=FALSE)
>
> H<-optimHess(op$par, LLNorm, dat=rn2,
> control=list(ndeps=rep((op$par[2]-0)/1000,2)))
>
> While this solution "works", it is surely not elegant, and I have
> following questions:
> 1) is this solution to use smaller steps good at all?
> 2) I am not sure either I understand the reasons why the hessian in a
> constrained optim problem is evaluated without the constraints, or at
> least the problem transformed into a warning?
>
> Furthermore, I realised that using the numDeriv package, the function
> hessian() does not offer either constraints for the parameters, yet in
> the special case of the example above, it does not fail (although would
> have problems with parameter even closer to the bound), unlike the
> optimHessian() function? See:
> library(numDeriv)
> hessian(LLNorm, op$par, dat=rn2)
>
> This brings me to the final question: is there a technical reason not to
> allow a "constrained" hessian, which seems to indicate the fact that the
> two implementations in R do not do it? I found a very interesting answer
> by Spencer Grave for  a similar question:
> http://tolstoy.newcastle.edu.au/R/e4/help/08/06/15061.html
> although this regards more the statistical implications than the
> numerical issues.
>
> Thanks a lot!
>
> Matthieu