[R] optim, L-BFGS-B | constrained bounds on parms?

Evan Cooch evan.cooch at gmail.com
Fri Sep 19 06:55:04 CEST 2014

Or, something to that effect. Following is an example of what I'm 
working with basic ABO blood type ML estimation from observed type 
(phenotypic) frequencies. First, I generate a log-likelihood function. 
mu[1] -> mu[2] are allele freqs for A and B alleles, respectively. Since 
freq of O allele is redundant, I use 1-mu[1]-mu[2] for that. The terms 
in the function are the probability expressions for the expected values 
of each phenotype.

But, that is somewhat besides the point:

f_abo <- function(mu) { 

So, I want to come up with MLE for mu[1] and mu[2] (for alleleic freqs 
for A and B alleles, respectively. Now, given the data, I know (from 
having maximized this likelihood outside of R) that the MLE for mu[1] is 
0.37176, and for mu[2], the same -- mu[2]=0.371763. I confirm this in 
MATLAB, and Maple, and Mathematica, using various non-linear 
solvers/optimization routines. They all yielded recisely the right answers.

But, stuck trying to come up with a general approach to getting the 
'right estimates' in R, that doesn't rely on strong prior knowledge of 
the parameters. I tried the following - I used L-BFGDS-B' because this 
is a 'boxed' optimzation: mu[1] and mu[2] are both parameters on the 
interval [0,1].

  results <- optim(c(0.3,0.3), f_abo,
      method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.9,0.9),
       hessian = TRUE,control=list(fnscale=-1))

but that through the following error at me:

L-BFGS-B needs finite values of 'fn'

OK, fine. Taking that literally, and thinking a bit, clear that the 
problem is that the upper bound on the parms creates the problem. So, I 
try the crude approach of making the upper bound for each 0.5:

  results <- optim(c(0.3,0.3), f_abo,
      method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.5,0.5),
       hessian = TRUE,control=list(fnscale=-1))

No errors this time, but no estimates either. At all.

OK -- so I 'cheat', and since I know that mu[1]=mu[2]=0.37176, I make 
another change to the upper limit, using 0.4 for both parms:

  results <- optim(c(0.3,0.3), f_abo,
      method = "L-BFGS-B", lower=c(0.1,0.1), upper=c(0.4,0.4),
       hessian = TRUE,control=list(fnscale=-1))

Works perfectly, and...right estimates too. ;-)

But, I could get there from here because I had prior knowledge of the 
parameter values. In other words, I cheated (not a thinly veiled 
suggestion that prior information is cheating, of course ;-)

What I'm trying to figure out is how to do a constrained optimization 
with R, where mu[1] and mu[2] are estimated subject to the constraint that

0 <= mu[1]+mu[2] <= 1

There seems to be no obvious way to impose that -- which creates a 
problem for optim since if I set 'vague' bounds on the parms (as per 
original attempt), optim tries combinations (like mu[1]=0.9, mu[2]=0.9), 
which aren't plausible, given the constraint that 0 <= mu[1]+mu[2] <= 1. 
Further, in this example, mu[1]=mu[2]. That might not be the case, and I 
might need to set upper bound on a parameter to be >0.5. But, without 
knowing which parameter, I'd need to set both from (say) 0.1 -> 0.9.

Is this possible with optim, or do I need to use a different package? If 
I can get there from here using optim, what do I need to do, either to 
my call to the optim routine, or the function that I pass to it?

This sort of thing is quite easy in (say) Maple. I simply execute


where I'm telling the NLPSolve function that there is a constraint for 
mu[1] and mu[2] (as above), which lets me set bounds on the parameter 
over larger interval. Can I do the same in R?

Again, I'm trying to avoid having to use a 'good guess'. I know I can 
gene count to come up with a quick and dirty starting point (the basis 
for the EM algorithm commonly used for this), but again, I'm trying to 
avoid that.

Thanks very much in advance.

	[[alternative HTML version deleted]]

More information about the R-help mailing list