[R] Dealing with -Inf in a maximisation problem.

Mon Nov 7 03:39:15 CET 2016

On 07/11/16 14:14, ProfJCNash wrote:

> Rolf, What optimizers did you try? There are a few in the optimrx package on R-forge that handle bounds, and it may be
> useful to set bounds in this case. Transformations using log or exp can be helpful if done carefully, but as you note,
> they can make the function more difficult to optimize.

I can't see how to impose bounds in my circumstances.  As you will have 
seen from my previous answer to Bill Dunlap's post, the b_i are 
probabilities, parametrised in terms of z_i:

     b_i = exp(z_i)/[sum_j exp(z_j)] .

It is not at all clear to me how to impose constraints on the z_i that 
will bound the b_i away from 0.

I can constrain L <= z_i <= U for all i and get b_i >= exp(L)/[n*exp(U)]
--- I think; I may have things upsidedown and backwards --- but this 
leaves an infinitude of choices for L and U.

Also the starting values at each M-step are "naturally" given in terms 
of the b_i.  I.e. I can calculate "reasonable" values for the b_i and 
then transform these to provide starting values for the z_i.  The 
starting values for z_i might not satisfy a given set of constraints.
I guess I could simply truncate the starting values to fall within the 
constraints, but that "feels wrong" to me.

I also worry about the impact that imposing constraints will have on
the monotonicity of the successive values of the expected log likelihood 
in the EM algorithm context.

> Be cautious about using the default numerical gradient approximations. optimrx allows selection of the numDeriv grad()
> function, which is quite good. Complex step would be better, but you need a function which can be computed with complex
> arguments. Unfortunately, numerical gradients often step over the cliff edge of computability of the function. The
> bounds are not checked for the step to compute things like (f(x+h) - f(x) / h.

I think I can program up an analytic gradient function.  Maybe I'll try 
that.  I have been reluctant to do so because I have had peculiar (bad) 
experiences in the past in trying to use analytic gradients with nlm().

Of course the (analytic) gradient becomes undefined if one of the b_i is 0.

cheers,

Rolf

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276