[R] Regression with factor having1 level
David Winsemius
dwinsemius at comcast.net
Fri Mar 11 08:25:57 CET 2016
> On Mar 10, 2016, at 5:45 PM, Nordlund, Dan (DSHS/RDA) <NordlDJ at dshs.wa.gov> wrote:
>
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David
>> Winsemius
>> Sent: Thursday, March 10, 2016 4:39 PM
>> To: Robert McGehee
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Regression with factor having1 level
>>
>>
>>> On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcgehee at gmail.com>
>> wrote:
>>>
>>> Hello R-helpers,
>>> I'd like a function that given an arbitrary formula and a data frame
>>> returns the residual of the dependent variable,and maintains all NA values.
>>
>> What does "maintains all NA values" actually mean?
>>>
>>> Here's an example that will give me what I want if my formula is
>>> y~x1+x2+x3 and my data frame is df:
>>>
>>> resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
>>>
>>> Here's the catch, I do not want my function to ever fail due to a
>>> factor with only one level. A one-level factor may appear because 1)
>>> the user passed it in, or 2) (more common) only one factor in a term
>>> is left after na.exclude removes the other NA values.
>>>
>>> Here is the error I would get
>>
>> From what code?
>>
>>
>>> above if one of the terms was a factor with one level:
>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>> contrasts can be applied only to factors with 2 or more levels
>>
>> Unable to create that error with the actions you decribe but to not actually
>> offer in coded form:
>>
>>
>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
>>> lm(y~x1+x2+x3, dfrm)
>>
>> Call:
>> lm(formula = y ~ x1 + x2 + x3, data = dfrm)
>>
>> Coefficients:
>> (Intercept) x1 x2TRUE x3
>> -0.16274 -0.30032 NA -0.09093
>>
>>> resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
>> 1 2 3 4 5 6
>> -0.16097245 0.65408508 -0.70098223 -0.15360434 1.26027872 0.55752239
>> 7 8 9 10
>> -0.05965653 -2.17480605 1.42917190 -0.65103650
>>
>>>
>>
>>
>>> Instead of giving me an error, I'd like the function to do just what
>>> lm() normally does when it sees a variable with no variance, ignore
>>> the variable (coefficient is NA) and continue to regress out all the other
>> variables.
>>> Thus if 'x2' is a factor with one variable in the above example, I'd
>>> like the function to return the result of:
>>> resid(lm(y~x1+x3, data=df, na.action=na.exclude)) Can anyone provide
>>> me a straight forward recommendation for how to do this?
>>> I feel like it should be easy, but I'm honestly stuck, and my Google
>>> searching for this hasn't gotten anywhere. The key is that I'd like
>>> the solution to be generic enough to work with an arbitrary linear
>>> formula, and not substantially kludgy (like trying ever combination of
>>> regressions terms until one works) as I'll be running this a lot on
>>> big data sets and don't want my computation time swamped by running
>>> unnecessary regressions or checking for number of factors after removing
>> NAs.
>>>
>>> Thanks in advance!
>>> --Robert
>>>
>>>
>>> PS. The Google search feature in the R-help archives appears to be down:
>>> http://tolstoy.newcastle.edu.au/R/
>>
>> It's working for me.
>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>
> I agree that what is wanted is not clear. However, if dfrm is created with x2 as a factor, then you get the error message that the OP mentions when you run the regression.
>
>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10))
>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> contrasts can be applied
Yes, and the error appears to come from `model.matrix`:
> model.matrix(y~x1+factor(x2)+x3, dfrm)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
> model.matrix(y~x1+x2+x3, dfrm)
(Intercept) x1 x2TRUE x3
1 1 0.04887847 1 -0.4199628
2 1 -1.04786688 1 1.3947923
3 1 -0.34896007 1 -2.1873666
4 1 -0.08866061 1 0.1204129
5 1 -0.41111366 1 -1.6631057
6 1 -0.83449110 1 1.1631801
7 1 -0.67887823 1 0.3207544
8 1 -1.12206068 1 0.6012040
9 1 0.05116683 1 0.3598696
10 1 1.74413583 1 0.3608478
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$x2
[1] "contr.treatment"
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list