[R] Regression with factor having1 level

David Winsemius dwinsemius at comcast.net
Fri Mar 11 08:25:57 CET 2016


> On Mar 10, 2016, at 5:45 PM, Nordlund, Dan (DSHS/RDA) <NordlDJ at dshs.wa.gov> wrote:
> 
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David
>> Winsemius
>> Sent: Thursday, March 10, 2016 4:39 PM
>> To: Robert McGehee
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Regression with factor having1 level
>> 
>> 
>>> On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcgehee at gmail.com>
>> wrote:
>>> 
>>> Hello R-helpers,
>>> I'd like a function that given an arbitrary formula and a data frame
>>> returns the residual of the dependent variable,and maintains all NA values.
>> 
>> What does "maintains all NA values" actually mean?
>>> 
>>> Here's an example that will give me what I want if my formula is
>>> y~x1+x2+x3 and my data frame is df:
>>> 
>>> resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
>>> 
>>> Here's the catch, I do not want my function to ever fail due to a
>>> factor with only one level. A one-level factor may appear because 1)
>>> the user passed it in, or 2) (more common) only one factor in a term
>>> is left after na.exclude removes the other NA values.
>>> 
>>> Here is the error I would get
>> 
>> From what code?
>> 
>> 
>>> above if one of the terms was a factor with one level:
>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>> contrasts can be applied only to factors with 2 or more levels
>> 
>> Unable to create that error with the actions you decribe but to not actually
>> offer in coded form:
>> 
>> 
>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
>>> lm(y~x1+x2+x3, dfrm)
>> 
>> Call:
>> lm(formula = y ~ x1 + x2 + x3, data = dfrm)
>> 
>> Coefficients:
>> (Intercept)           x1       x2TRUE           x3
>>   -0.16274     -0.30032           NA     -0.09093
>> 
>>> resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
>>          1           2           3           4           5           6
>> -0.16097245  0.65408508 -0.70098223 -0.15360434  1.26027872  0.55752239
>>          7           8           9          10
>> -0.05965653 -2.17480605  1.42917190 -0.65103650
>> 
>>> 
>> 
>> 
>>> Instead of giving me an error, I'd like the function to do just what
>>> lm() normally does when it sees a variable with no variance, ignore
>>> the variable (coefficient is NA) and continue to regress out all the other
>> variables.
>>> Thus if 'x2' is a factor with one variable in the above example, I'd
>>> like the function to return the result of:
>>> resid(lm(y~x1+x3, data=df, na.action=na.exclude)) Can anyone provide
>>> me a straight forward recommendation for how to do this?
>>> I feel like it should be easy, but I'm honestly stuck, and my Google
>>> searching for this hasn't gotten anywhere. The key is that I'd like
>>> the solution to be generic enough to work with an arbitrary linear
>>> formula, and not substantially kludgy (like trying ever combination of
>>> regressions terms until one works) as I'll be running this a lot on
>>> big data sets and don't want my computation time swamped by running
>>> unnecessary regressions or checking for number of factors after removing
>> NAs.
>>> 
>>> Thanks in advance!
>>> --Robert
>>> 
>>> 
>>> PS. The Google search feature in the R-help archives appears to be down:
>>> http://tolstoy.newcastle.edu.au/R/
>> 
>> It's working for me.
>> 
>>> 
>>> 	[[alternative HTML version deleted]]
>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
> 
> I agree that what is wanted is not clear.  However, if dfrm is created with x2 as a factor, then you get the error message that the OP mentions when you run the regression.
> 
>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10))
>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>  contrasts can be applied

Yes, and the error appears to come from `model.matrix`:

> model.matrix(y~x1+factor(x2)+x3, dfrm)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

> model.matrix(y~x1+x2+x3, dfrm)
   (Intercept)          x1 x2TRUE         x3
1            1  0.04887847      1 -0.4199628
2            1 -1.04786688      1  1.3947923
3            1 -0.34896007      1 -2.1873666
4            1 -0.08866061      1  0.1204129
5            1 -0.41111366      1 -1.6631057
6            1 -0.83449110      1  1.1631801
7            1 -0.67887823      1  0.3207544
8            1 -1.12206068      1  0.6012040
9            1  0.05116683      1  0.3598696
10           1  1.74413583      1  0.3608478
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$x2
[1] "contr.treatment"

-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list