[R] Linear Regression with Constraints

Thu May 28 00:08:11 CEST 2009

Le mercredi 27 mai 2009 à 17:28 +1000, Bill.Venables at csiro.au a écrit :
> You can accommodate the constraints by, e.g., putting
> 
> c2 = pnorm(theta2)
> c3 = pnorm(theta3)

Nice. I'd have tried invlogit(), but I'm seriously biased...

> x1 has a known coefficient (unity) so it becomes an offset.
> Essentially your problem can be written
> 
> y1 = y-x1 = c1 + pnorm(theta2)*x2 - pnorm(theta3)*x3 + error
> 
> This is then a (pretty simple) non-linear regression which could be
> fitted using, e.g. nls
> 
> If you could not rule out the possibility that the solution is on the
> boundary, you could put c2 = (cos(theta2))^2, and the fitting
> procedure could take you there.  The solution is not unique, but the
> original coefficients, c2,c3, would be unique, of course.

Better than invlogit...

> With just 6 observations and 4 parameters to estimate, you will need
> the model to be an exceptionally close fitting one for the fit to have
> any credibility at all.

Seconded ! Except that "an exceptionally good fit" might also be an
accident, therefore with not much more credentials that an average
one...

Now to get back to the original formulation of the question, it turns
out that, for those particular bounds, you don't even need
constrOptim() : optim() will do, as exemplified in the following
script :

>
#######################################################################
> # Constrained optimization demo
> 
> # It turns out that, for "vertical" bounds (i. e. bounds for a
coefficient
> # not depend on other coefficients), optim does the job.
> # constrOptim does the job for any polyhedral domain, but uses a more
> # intricate form of bounds expression (hyperplanes in p dimensions).
> # compare ?optim and ?constrOptim...
> 
> # Pseudodata for the question asked by "Stu @ AGS"
<stu at agstechnet.com>
> #  in r-help (Linear Regression with Constraints)
> # Begin quote
> # y = c1 + x1 + (c2 * x2) - (c3 * x3)
> # 
> # Where c1, c2, and c3 are the desired regression coefficients that
are
> # subject to the following constraints:
> # 
> # 0.0 < c2 < 1.0, and
> # 0.0 < c3 < 1.0
> # End quote
> 
> # y1 : "Okay" choice of parameters : c2=0.3, c3=0.7
> # y2 : (Vicious) choice of "true" parameters : c2=0.5, c3=-0.5
> # optimum out of bounds
> # c1=1 in both cases
> 
> set.seed(1453)
> 
> PseudoData<-data.frame(x1=runif(6, -5, 5),
+                        x2=runif(6, -5, 5),
+                        x3=runif(6, -5, 5))
> 
> PseudoData<-within(PseudoData, {
+   y1=1+x1+0.3*x2+0.7*x3+rnorm(6, 0, 3)
+   y2=1+x1+0.5*x2-0.5*x3+rnorm(6, 0, 3)
+ })
> 
> Data1<-with(PseudoData,data.frame(y=y1, x1=x1, x2=x2, x3=x3))
> Data2<-with(PseudoData,data.frame(y=y2, x1=x1, x2=x2, x3=x3))
> 
> # The objective function : least squares.
> # I"m lazy....
> 
> # Squared residuals : vector function, to be summed in objective,
> # and needs data in its eval environment.
> # R really needs some forme of lispish (let) or (let*)...
> 
> e<-expression((y-(c1+x1+c2*x2+c3*x3))^2) # Least squares (needs data
in env.)
> 
> # Use expression form of deriv(), to allow easy evaluation in a
constructed
> # environment
> 
> foo<-deriv(e, nam=c("c1","c2","c3"))
> 
> # Objective
> 
> objfun<-function(coefs, data) {
+   return(sum(eval(foo,env=c(as.list(coefs), as.list(data)))))
+ }
> 
> # Objective's gradient
> 
> objgrad<-function(coefs, data) {
+   return(apply(attr(eval(foo,env=c(as.list(coefs), as.list(data))),
+                        "gradient"),2,sum))
+ }
> 
> # raincheck : unbounded optimization, but using a form palatable to
> # bounded optimization
> 
> system.time(D1.unbound<-optim(par=c(c1=0.5, c2=0.5, c3=0.5),
+                               fn=objfun,
+                               gr=objgrad,
+                               data=Data1,
+                               method="L-BFGS-B",
+                               lower=rep(-Inf, 3),
+                               upper=rep(Inf, 3)))
utilisateur     système      écoulé 
      0.004       0.000       0.004 
> 
> D1.unbound
$par
        c1         c2         c3 
-0.5052088  0.2478704  0.7626611 

# c2 and c3 are sort of okay (larger range of variation for x2 and x3)
# but c1 is a bit out of whack (residual=3, we were unlucky...)

$value
[1] 12.87678

# keep in mind for future comparison

$counts
function gradient 
       9        9 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

> 
> # Another rainchech : does bound optimization reach the same objective
> # when the "true" value lies in the bound region ?
> 
> system.time(D2.unbound<-optim(par=c(c1=0.5, c2=0.5, c3=0.5),
+                               fn=objfun,
+                               gr=objgrad,
+                               data=Data2,
+                               method="L-BFGS-B",
+                               lower=rep(-Inf, 3),
+                               upper=rep(Inf, 3)))
utilisateur     système      écoulé 
      0.008       0.000       0.006 
> 
> D2.unbound
$par
        c1         c2         c3 
 2.3988370  0.1983261 -0.3141863 

# Not very close to the "real" values

$value
[1] 8.213914

# keep in mind

$counts
function gradient 
      12       12 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

> 
> system.time(D1.bound<-optim(par=c(c1=0.5, c2=0.5, c3=0.5),
+                               fn=objfun,
+                               gr=objgrad,
+                               data=Data1,
+                               method="L-BFGS-B",
+                               lower=c(-Inf,0,0),
+                               upper=c(Inf,1,1)))
utilisateur     système      écoulé 
      0.004       0.000       0.004 
> 
> D1.bound
$par
        c1         c2         c3 
-0.5052117  0.2478706  0.7626612 

$value
[1] 12.87678

# About the same values as for the unbounded case. Okay

$counts
function gradient 
       9        9 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

> 
> # The real test : what will be found when the "real" objective is out
of
> # bounds ?
> 
> system.time(D2.bound<-optim(par=c(c1=0.5, c2=0.5, c3=0.5),
+                               fn=objfun,
+                               gr=objgrad,
+                               data=Data2,
+                               method="L-BFGS-B",
+                               lower=c(-Inf,0,0),
+                               upper=c(Inf,1,1)))
utilisateur     système      écoulé 
      0.004       0.000       0.004 
> 
> D2.bound
$par
       c1        c2        c3 
2.0995442 0.2192566 0.0000000 

# The optimizer bangs is pretty little head on the c3 wall. c1 is out of
# the picture, but c2 happens to be not so bad...

$value
[1] 14.4411

# The residual is (as expected) quite larger than in the unbound case...

$counts
function gradient 
       8        8 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

# ... and that's not a computing anomaly.

This also illustrates the hubris necessary to fit 3 coefficients on 6
data points...

HTH,

					Emmanuel Charpentier