[R] Regression with factor having1 level

David Winsemius dwinsemius at comcast.net
Fri Mar 11 01:39:24 CET 2016


> On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcgehee at gmail.com> wrote:
> 
> Hello R-helpers,
> I'd like a function that given an arbitrary formula and a data frame
> returns the residual of the dependent variable,and maintains all NA values.

What does "maintains all NA values" actually mean?
> 
> Here's an example that will give me what I want if my formula is y~x1+x2+x3
> and my data frame is df:
> 
> resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
> 
> Here's the catch, I do not want my function to ever fail due to a factor
> with only one level. A one-level factor may appear because 1) the user
> passed it in, or 2) (more common) only one factor in a term is left after
> na.exclude removes the other NA values.
> 
> Here is the error I would get

From what code?


> above if one of the terms was a factor with
> one level:
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>  contrasts can be applied only to factors with 2 or more levels

Unable to create that error with the actions you decribe but to not actually offer in coded form:


> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
> lm(y~x1+x2+x3, dfrm)

Call:
lm(formula = y ~ x1 + x2 + x3, data = dfrm)

Coefficients:
(Intercept)           x1       x2TRUE           x3  
   -0.16274     -0.30032           NA     -0.09093  

> resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
          1           2           3           4           5           6 
-0.16097245  0.65408508 -0.70098223 -0.15360434  1.26027872  0.55752239 
          7           8           9          10 
-0.05965653 -2.17480605  1.42917190 -0.65103650 

> 


> Instead of giving me an error, I'd like the function to do just what lm()
> normally does when it sees a variable with no variance, ignore the variable
> (coefficient is NA) and continue to regress out all the other variables.
> Thus if 'x2' is a factor with one variable in the above example, I'd like
> the function to return the result of:
> resid(lm(y~x1+x3, data=df, na.action=na.exclude))
> Can anyone provide me a straight forward recommendation for how to do this?
> I feel like it should be easy, but I'm honestly stuck, and my Google
> searching for this hasn't gotten anywhere. The key is that I'd like the
> solution to be generic enough to work with an arbitrary linear formula, and
> not substantially kludgy (like trying ever combination of regressions terms
> until one works) as I'll be running this a lot on big data sets and don't
> want my computation time swamped by running unnecessary regressions or
> checking for number of factors after removing NAs.
> 
> Thanks in advance!
> --Robert
> 
> 
> PS. The Google search feature in the R-help archives appears to be down:
> http://tolstoy.newcastle.edu.au/R/

It's working for me.

> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list