[R] Regression with factor having1 level

David Winsemius dwinsemius at comcast.net
Fri Mar 11 23:48:55 CET 2016


> On Mar 11, 2016, at 2:07 PM, peter dalgaard <pdalgd at gmail.com> wrote:
> 
> 
>> On 11 Mar 2016, at 17:56 , David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>>> 
>>> On Mar 11, 2016, at 12:48 AM, peter dalgaard <pdalgd at gmail.com> wrote:
>>> 
>>> 
>>>> On 11 Mar 2016, at 08:25 , David Winsemius <dwinsemius at comcast.net> wrote:
>>>>> 
>>> ...
>>>>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10))
>>>>>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
>>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>>>>> contrasts can be applied
>>>> 
>>>> Yes, and the error appears to come from `model.matrix`:
>>>> 
>>>>> model.matrix(y~x1+factor(x2)+x3, dfrm)
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>>>> contrasts can be applied only to factors with 2 or more levels
>>>> 
>>> 
>>> Actually not. The above is because you use an explicit factor(x2). The actual smoking gun is this line in lm()
>>> 
>>> mf$drop.unused.levels <- TRUE
>> 
>> It's possible that modifying model.matrix to allow single level factors would then bump up against that check, but  at the moment the traceback() from an error generated with data that has a single level factor and no call to factor in the formula still implicates code in model.matrix:
> 
> You're missing the point: model.matrix has a beef with 1-level factors, not with 2-level factors of which one level happens to be absent, which is what this thread was originally about. It is lm that via model.frame with drop.unused.levels=TRUE converts the latter factors to the former.
> 

I guess I did miss the point. Apologies for being obtuse. I thought that a one level factor would have been "aliased out" when model.matrix "realized" that it was collinear with the intercept. (Further apologies for my projection of cognitive capacites on a machine.) Are you saying it remains desirable that an error be thrown rather than reporting an NA for coefficients and issuing a warning?

-- 
David.


> -pd 
> 
> 
>> 
>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=factor(TRUE), x3=rnorm(10))
>>> lm(y~x1+x2+x3, dfrm)
>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
>> contrasts can be applied only to factors with 2 or more levels
>>> traceback()
>> 5: stop("contrasts can be applied only to factors with 2 or more levels")
>> 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
>> 3: model.matrix.default(mt, mf, contrasts)
>> 2: model.matrix(mt, mf, contrasts)
>> 1: lm(y ~ x1 + x2 + x3, dfrm)
>> 
>> -- 
>> David.
>> 
>>> 
>>> which someone must have thought was a good idea at some point....
>>> 
>>> model.matrix itself is quite happy to leave factors alone and let subsequent code sort out any singularities, e.g.
>>> 
>>>> model.matrix(y~x1+x2, data=df[1:2,])
>>> (Intercept) x1 x2B
>>> 1           1  1   0
>>> 2           1  1   0
>>> attr(,"assign")
>>> [1] 0 1 2
>>> attr(,"contrasts")
>>> attr(,"contrasts")$x2
>>> [1] "contr.treatment"
>>> 
>>> 
>>> 
>>>>> model.matrix(y~x1+x2+x3, dfrm)
>>>> (Intercept)          x1 x2TRUE         x3
>>>> 1            1  0.04887847      1 -0.4199628
>>>> 2            1 -1.04786688      1  1.3947923
>>>> 3            1 -0.34896007      1 -2.1873666
>>>> 4            1 -0.08866061      1  0.1204129
>>>> 5            1 -0.41111366      1 -1.6631057
>>>> 6            1 -0.83449110      1  1.1631801
>>>> 7            1 -0.67887823      1  0.3207544
>>>> 8            1 -1.12206068      1  0.6012040
>>>> 9            1  0.05116683      1  0.3598696
>>>> 10           1  1.74413583      1  0.3608478
>>>> attr(,"assign")
>>>> [1] 0 1 2 3
>>>> attr(,"contrasts")
>>>> attr(,"contrasts")$x2
>>>> [1] "contr.treatment"
>>>> 
>>>> -- 
>>>> 
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> -- 
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
> 
> 
> 
> 
> 
> 
> 
> 
> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list