[R] Constrained Regression

Sun Oct 31 18:44:09 CET 2010

On Oct 31, 2010, at 12:54 PM, Spencer Graves wrote:

> Have you tried the 'sos' package?

I have, and I am taking this opportunity to load it with my .Rprofile  
to make it more accessible. It works very well. Very clean display. I  
also have constructed a variant of RSiteSearch that I find more useful  
than the current defaults.

rhelpSearch <- function(string,
                   restrict = c("Rhelp10", "Rhelp08", "Rhelp02",  
"functions" ),
                    matchesPerPage = 100, ...) {
               RSiteSearch(string=string,
               restrict = restrict,
               matchesPerPage = matchesPerPage, ...)}

>
> install.packages('sos') # if not already installed
> library(sos)
> cr <- ???'constrained regression' # found 149 matches
> summary(cr) # in 69 packages
> cr # opens a table in a browser listing all 169 matches with links  
> to the help pages
>
>      However, I agree with Ravi Varadhan:  I'd want to understand  
> the physical mechanism generating the data.  If each is, for  
> example, a proportion, then I'd want to use logistic regression,  
> possible after some approximate logistic transformation of X1 and X2  
> that prevents logit(X) from going to +/-Inf.  This is a different  
> model, but it achieves the need to avoid predictions of Y going  
> outside the range (0, 1).

No argument. I defer to both of your greater experiences in such  
problems and your interest in educating us less knowledgeable users. I  
also need to amend my suggested strategy in situations where a linear  
model _might_ be appropriate, since I think the inclusion of the  
surrogate variable in the solve.QP setup is very probably creating  
confusion. After reconsideration I think one should keep the two  
approaches separate. These are two approaches to the non-intercept  
versions of the model that yield the same estimate (but only because  
the constraints do not get invoked ):

 > lm(medv~I(age-lstat) +offset(lstat) -1, data=Boston)

Call:
lm(formula = medv ~ I(age - lstat) + offset(lstat) - 1, data = Boston)

Coefficients:
I(age - lstat)
         0.1163

 > library(MASS)   ## to access the Boston data
 >  designmat <- model.matrix(medv~age+lstat-1, data=Boston)
 >  Dmat <-crossprod(designmat, designmat); dvec <- crossprod(designmat,
+  Boston$medv)
 >   Amat <- cbind(1, diag(NROW(Dmat)));  bvec <-  
c(1,rep(0,NROW(Dmat))); meq <- 1
 >  library(quadprog);
 >  res <- solve.QP(Dmat, dvec, Amat, bvec, meq)
 > zapsmall(res$solution)  # zapsmall not really needed in this instance
[1] 0.1163065 0.8836935

-- 
David.
>
>
>      Spencer
>
>
> On 10/31/2010 9:01 AM, David Winsemius wrote:
>>
>> On Oct 31, 2010, at 2:44 AM, Jim Silverton wrote:
>>
>>> Hello everyone,
>>> I have 3 variables Y, X1 and X2. Each variables lies between 0 and  
>>> 1. I want
>>> to do a constrained regression such that a>0 and (1-a) >0
>>>
>>> for the model:
>>>
>>> Y = a*X1 + (1-a)*X2
>>
>>
>> It would not accomplish the constraint that a > 0 but you could  
>> accomplish the other constraint within an lm fit:
>>
>> X3 <- X1-X2
>> lm(Y ~ X3 + offset(X2) )
>>
>> Since beta1 is for the model Y ~ 1 + beta1(X1- X2) + 1*X2)
>>                             Y ~ intercept + beta1*X1 + (1 -beta1)*X2
>>
>> ... so beta1 is a.
>>
>> In the case beta < 0 then I suppose a would be assigned 0. This  
>> might be accomplished within an iterative calculation framework by  
>> a large penalization for negative values. In a reply (1) to a  
>> question by Carlos Alzola in 2008 on rhalp, Berwin Turlach offered  
>> a solution to a similar problem ( sum(coef) == 1 AND coef non- 
>> negative). Modifying his code to incorporate the above strategy  
>> (and choosing two variables for which parameter values might be  
>> inside the constraint boundaries) we get:
>>
>> library(MASS)   ## to access the Boston data
>>  designmat <- model.matrix(medv~I(age-lstat) +offset(lstat),  
>> data=Boston)
>>  Dmat <-crossprod(designmat, designmat); dvec <- crossprod(designmat,
>>  Boston$medv)
>>  Amat <- cbind(1, diag(NROW(Dmat)))
>>  bvec <- c(1,rep(0,NROW(Dmat)))
>>  meq <- 1
>> library(quadprog)
>>  res <- solve.QP(Dmat, dvec, Amat, bvec, meq)
>>
>> > zapsmall(res$solution)
>> [1] 0.686547 0.313453
>>
>> Turlach specifically advised against any interpretation of this  
>> particular result which was only contructed to demonstrate the  
>> mathematical mechanics.
>>
>>>
>>> I tried the help on the constrained regression in R but I concede  
>>> that it
>>> was not helpful.
>>
>> I must not have that package installed because I got nothing that  
>> appeared to be useful with ??"constrained regression" .
>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> 1) http://finzi.psych.upenn.edu/Rhelp10/2008-March/155990.html
>>
>> ______________________________________________

David Winsemius, MD
West Hartford, CT