[R] lm#contrasts#one level in factor: bug or feature

Thomas Lumley tlumley at u.washington.edu
Tue Oct 12 22:00:02 CEST 2004

On Tue, 12 Oct 2004, Ritter, Christian C MCIL-CTANL/S wrote:

> (R.1.9.1; win2000)
> Since it's about the tenth time I had to write an "if" around this to catch the error ...
> Let's look at the line
> 	myfit<-lm(res~groupvar,data=Data)
> Here res is of numeric type and groupvar is a factor. On first sight, it 
> would be logical that if groupvar had only one (single) level we would 
> get:
> 	Error in "contrasts<-"(`*tmp*`, value = "contr.treatment") : 
> contrasts can be applied only to factors with 2 or more levels
> But then again, it's also inconsistent: Normally redundant variables on 
> the right of the ~ (variables which are constant or linearly dependent 
> on previous variables) don't lead to "Error". They just get a "NA" as a 
> coefficient estimate. A factor with a single level is of this type, it 
> is just a constant.  Obviously (to me) lm (or model.matrix.default) 
> should not try to calculate contrasts on it and it's coefficient should 
> be NA. Shouldn't it?

No, a factor with k levels isn't just a set of k linearly dependent 
variables (otherwise a factor would always give an NA coefficient for one 
level).  A factor specifies a set of contrasts (usually k-1 of them) that 
go into the model matrix.  In that sense a factor with 1 level should 
specify zero columns of the model matrix, rather than giving an error.

> Why is this not unimportant? Imagine you make a model of the above type 
> for a data set with more than one level in groupvar. Then you select 
> subsets and refit. It shouldn't die with error if you select a subset 
> with only one level of groupvar.

Possibly, but your results are likely to be pretty meaningless.  If the 
subset does not have *all* the levels of the factor then the contrasts 
will change and the variable will be coded differently, possibly giving 
coefficients with different meanings.

In the case of contr.treatment, for example, if the lowest level of the 
factor is missing then all the other levels are recoded as contrasts to 
the next lowest level.


More information about the R-help mailing list