[R] lm#contrasts#one level in factor: bug or feature
Thomas Lumley
tlumley at u.washington.edu
Tue Oct 12 22:00:02 CEST 2004
On Tue, 12 Oct 2004, Ritter, Christian C MCIL-CTANL/S wrote:
> (R.1.9.1; win2000)
>
> Since it's about the tenth time I had to write an "if" around this to catch the error ...
> Let's look at the line
>
> myfit<-lm(res~groupvar,data=Data)
>
> Here res is of numeric type and groupvar is a factor. On first sight, it
> would be logical that if groupvar had only one (single) level we would
> get:
>
> Error in "contrasts<-"(`*tmp*`, value = "contr.treatment") :
> contrasts can be applied only to factors with 2 or more levels
>
> But then again, it's also inconsistent: Normally redundant variables on
> the right of the ~ (variables which are constant or linearly dependent
> on previous variables) don't lead to "Error". They just get a "NA" as a
> coefficient estimate. A factor with a single level is of this type, it
> is just a constant. Obviously (to me) lm (or model.matrix.default)
> should not try to calculate contrasts on it and it's coefficient should
> be NA. Shouldn't it?
No, a factor with k levels isn't just a set of k linearly dependent
variables (otherwise a factor would always give an NA coefficient for one
level). A factor specifies a set of contrasts (usually k-1 of them) that
go into the model matrix. In that sense a factor with 1 level should
specify zero columns of the model matrix, rather than giving an error.
>
> Why is this not unimportant? Imagine you make a model of the above type
> for a data set with more than one level in groupvar. Then you select
> subsets and refit. It shouldn't die with error if you select a subset
> with only one level of groupvar.
>
Possibly, but your results are likely to be pretty meaningless. If the
subset does not have *all* the levels of the factor then the contrasts
will change and the variable will be coded differently, possibly giving
coefficients with different meanings.
In the case of contr.treatment, for example, if the lowest level of the
factor is missing then all the other levels are recoded as contrasts to
the next lowest level.
-thomas
More information about the R-help
mailing list