# [R] optim, L-BFGS-B | constrained bounds on parms?

Evan Cooch evan.cooch at gmail.com
Fri Sep 19 06:55:04 CEST 2014

```Or, something to that effect. Following is an example of what I'm
working with basic ABO blood type ML estimation from observed type
(phenotypic) frequencies. First, I generate a log-likelihood function.
mu[1] -> mu[2] are allele freqs for A and B alleles, respectively. Since
freq of O allele is redundant, I use 1-mu[1]-mu[2] for that. The terms
in the function are the probability expressions for the expected values
of each phenotype.

But, that is somewhat besides the point:

f_abo <- function(mu) {
25*log(mu[1]^2+2*mu[1]*(1-mu[1]-mu[2]))+25*log(mu[2]^2+2*mu[2]*(1-mu[1]-mu[2]))+50*log(2*mu[1]*mu[2])+15*log((1-mu[1]-mu[2])^2)
}

So, I want to come up with MLE for mu[1] and mu[2] (for alleleic freqs
for A and B alleles, respectively. Now, given the data, I know (from
having maximized this likelihood outside of R) that the MLE for mu[1] is
0.37176, and for mu[2], the same -- mu[2]=0.371763. I confirm this in
MATLAB, and Maple, and Mathematica, using various non-linear
solvers/optimization routines. They all yielded recisely the right answers.

But, stuck trying to come up with a general approach to getting the
'right estimates' in R, that doesn't rely on strong prior knowledge of
the parameters. I tried the following - I used L-BFGDS-B' because this
is a 'boxed' optimzation: mu[1] and mu[2] are both parameters on the
interval [0,1].

results <- optim(c(0.3,0.3), f_abo,
method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.9,0.9),
hessian = TRUE,control=list(fnscale=-1))

but that through the following error at me:

L-BFGS-B needs finite values of 'fn'

OK, fine. Taking that literally, and thinking a bit, clear that the
problem is that the upper bound on the parms creates the problem. So, I
try the crude approach of making the upper bound for each 0.5:

results <- optim(c(0.3,0.3), f_abo,
method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.5,0.5),
hessian = TRUE,control=list(fnscale=-1))

No errors this time, but no estimates either. At all.

OK -- so I 'cheat', and since I know that mu[1]=mu[2]=0.37176, I make
another change to the upper limit, using 0.4 for both parms:

results <- optim(c(0.3,0.3), f_abo,
method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.4,0.4),
hessian = TRUE,control=list(fnscale=-1))

Works perfectly, and...right estimates too. ;-)

But, I could get there from here because I had prior knowledge of the
parameter values. In other words, I cheated (not a thinly veiled
suggestion that prior information is cheating, of course ;-)

What I'm trying to figure out is how to do a constrained optimization
with R, where mu[1] and mu[2] are estimated subject to the constraint that

0 <= mu[1]+mu[2] <= 1

There seems to be no obvious way to impose that -- which creates a
problem for optim since if I set 'vague' bounds on the parms (as per
original attempt), optim tries combinations (like mu[1]=0.9, mu[2]=0.9),
which aren't plausible, given the constraint that 0 <= mu[1]+mu[2] <= 1.
Further, in this example, mu[1]=mu[2]. That might not be the case, and I
might need to set upper bound on a parameter to be >0.5. But, without
knowing which parameter, I'd need to set both from (say) 0.1 -> 0.9.

Is this possible with optim, or do I need to use a different package? If
I can get there from here using optim, what do I need to do, either to
my call to the optim routine, or the function that I pass to it?

This sort of thing is quite easy in (say) Maple. I simply execute

NLPSolve(f_abo,initialpoint={mu[1]=0.2,mu[2]=0.2},{mu[1]+mu[2]<=1},mu[1]=0.1..0.9,mu[2]=0.1..0.9,maximize);

where I'm telling the NLPSolve function that there is a constraint for
mu[1] and mu[2] (as above), which lets me set bounds on the parameter
over larger interval. Can I do the same in R?

Again, I'm trying to avoid having to use a 'good guess'. I know I can
gene count to come up with a quick and dirty starting point (the basis
for the EM algorithm commonly used for this), but again, I'm trying to
avoid that.