[R] Regression with factor having1 level
David Winsemius
dwinsemius at comcast.net
Fri Mar 11 23:48:55 CET 2016
> On Mar 11, 2016, at 2:07 PM, peter dalgaard <pdalgd at gmail.com> wrote:
>
>
>> On 11 Mar 2016, at 17:56 , David Winsemius <dwinsemius at comcast.net> wrote:
>>
>>>
>>> On Mar 11, 2016, at 12:48 AM, peter dalgaard <pdalgd at gmail.com> wrote:
>>>
>>>
>>>> On 11 Mar 2016, at 08:25 , David Winsemius <dwinsemius at comcast.net> wrote:
>>>>>
>>> ...
>>>>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10))
>>>>>> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
>>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>>> contrasts can be applied
>>>>
>>>> Yes, and the error appears to come from `model.matrix`:
>>>>
>>>>> model.matrix(y~x1+factor(x2)+x3, dfrm)
>>>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>>>> contrasts can be applied only to factors with 2 or more levels
>>>>
>>>
>>> Actually not. The above is because you use an explicit factor(x2). The actual smoking gun is this line in lm()
>>>
>>> mf$drop.unused.levels <- TRUE
>>
>> It's possible that modifying model.matrix to allow single level factors would then bump up against that check, but at the moment the traceback() from an error generated with data that has a single level factor and no call to factor in the formula still implicates code in model.matrix:
>
> You're missing the point: model.matrix has a beef with 1-level factors, not with 2-level factors of which one level happens to be absent, which is what this thread was originally about. It is lm that via model.frame with drop.unused.levels=TRUE converts the latter factors to the former.
>
I guess I did miss the point. Apologies for being obtuse. I thought that a one level factor would have been "aliased out" when model.matrix "realized" that it was collinear with the intercept. (Further apologies for my projection of cognitive capacites on a machine.) Are you saying it remains desirable that an error be thrown rather than reporting an NA for coefficients and issuing a warning?
--
David.
> -pd
>
>
>>
>>> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=factor(TRUE), x3=rnorm(10))
>>> lm(y~x1+x2+x3, dfrm)
>> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>> contrasts can be applied only to factors with 2 or more levels
>>> traceback()
>> 5: stop("contrasts can be applied only to factors with 2 or more levels")
>> 4: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
>> 3: model.matrix.default(mt, mf, contrasts)
>> 2: model.matrix(mt, mf, contrasts)
>> 1: lm(y ~ x1 + x2 + x3, dfrm)
>>
>> --
>> David.
>>
>>>
>>> which someone must have thought was a good idea at some point....
>>>
>>> model.matrix itself is quite happy to leave factors alone and let subsequent code sort out any singularities, e.g.
>>>
>>>> model.matrix(y~x1+x2, data=df[1:2,])
>>> (Intercept) x1 x2B
>>> 1 1 1 0
>>> 2 1 1 0
>>> attr(,"assign")
>>> [1] 0 1 2
>>> attr(,"contrasts")
>>> attr(,"contrasts")$x2
>>> [1] "contr.treatment"
>>>
>>>
>>>
>>>>> model.matrix(y~x1+x2+x3, dfrm)
>>>> (Intercept) x1 x2TRUE x3
>>>> 1 1 0.04887847 1 -0.4199628
>>>> 2 1 -1.04786688 1 1.3947923
>>>> 3 1 -0.34896007 1 -2.1873666
>>>> 4 1 -0.08866061 1 0.1204129
>>>> 5 1 -0.41111366 1 -1.6631057
>>>> 6 1 -0.83449110 1 1.1631801
>>>> 7 1 -0.67887823 1 0.3207544
>>>> 8 1 -1.12206068 1 0.6012040
>>>> 9 1 0.05116683 1 0.3598696
>>>> 10 1 1.74413583 1 0.3608478
>>>> attr(,"assign")
>>>> [1] 0 1 2 3
>>>> attr(,"contrasts")
>>>> attr(,"contrasts")$x2
>>>> [1] "contr.treatment"
>>>>
>>>> --
>>>>
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> --
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> David Winsemius
>> Alameda, CA, USA
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>
>
>
>
>
>
>
>
>
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list