[R] Regression with factor having1 level

peter dalgaard pdalgd at gmail.com
Sat Mar 12 00:57:23 CET 2016


> On 11 Mar 2016, at 23:48 , David Winsemius <dwinsemius at comcast.net> wrote:
> 
>> 
>> On Mar 11, 2016, at 2:07 PM, peter dalgaard <pdalgd at gmail.com> wrote:
>> 
>> 
>>> On 11 Mar 2016, at 17:56 , David Winsemius <dwinsemius at comcast.net> wrote:
>>> 
>>>> 
>>>> On Mar 11, 2016, at 12:48 AM, peter dalgaard <pdalgd at gmail.com> wrote:
>>>> 
>>>> 
>>>>> On 11 Mar 2016, at 08:25 , David Winsemius <dwinsemius at comcast.net> wrote:
>>>>>> 
>>>> ...
>>>>>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10))
>>>>>>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
>>>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>>>>>> contrasts can be applied
>>>>> 
>>>>> Yes, and the error appears to come from `model.matrix`:
>>>>> 
>>>>>> model.matrix(y~x1+factor(x2)+x3, dfrm)
>>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>>>>> contrasts can be applied only to factors with 2 or more levels
>>>>> 
>>>> 
>>>> Actually not. The above is because you use an explicit factor(x2). The actual smoking gun is this line in lm()
>>>> 
>>>> mf$drop.unused.levels <- TRUE
>>> 
>>> It's possible that modifying model.matrix to allow single level factors would then bump up against that check, but  at the moment the traceback() from an error generated with data that has a single level factor and no call to factor in the formula still implicates code in model.matrix:
>> 
>> You're missing the point: model.matrix has a beef with 1-level factors, not with 2-level factors of which one level happens to be absent, which is what this thread was originally about. It is lm that via model.frame with drop.unused.levels=TRUE converts the latter factors to the former.
>> 
> 
> I guess I did miss the point. Apologies for being obtuse. I thought that a one level factor would have been "aliased out" when model.matrix "realized" that it was collinear with the intercept. (Further apologies for my projection of cognitive capacites on a machine.) Are you saying it remains desirable that an error be thrown rather than reporting an NA for coefficients and issuing a warning?
> 

For the moment I was just analyzing where this came from. Intuitively I'd be leaning in the opposite direction -- dropping factor levels automatically is usually a bad thing.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list