[R-sig-ME] Problem with the categorical predictor in the factor format at level 1

Ben Bolker bbolker at gmail.com
Wed Feb 20 03:18:21 CET 2013


Sunthud Pornprasertmanit <psunthud at ...> writes:

> 
> Dear all,
> 
> I have run a model with fixed intercepts but random slopes on categorical
> predictors by the following command:
> 
> FixedIntRandomSlope <- lmer(POPULAR ~ 1 + SEX + (0 + SEX|SCHOOL), data =
> popular, REML = FALSE)
> summary(FixedIntRandomSlope)
> 
> I got the different results in the random effect when I treated SEX as
> dummy variable manually or treated SEX as factor.
> 
> Here is the result for the dummy-variable predictor:
> 
> Random effects:
>  Groups   Name Variance Std.Dev.
>  SCHOOL   SEX  0.87531  0.93558
>  Residual      0.87053  0.93302
> 
> Here is the result for the variable transformed into factor format:
> 
> Random effects:
>  Groups   Name Variance Std.Dev. Corr
>  SCHOOL   SEX0 0.93044  0.96459
>           SEX1 0.92104  0.95971  0.855
>  Residual      0.39244  0.62645
> 
> I think SEX0 and SEX1 should not be both random effects.
> 
> I have checked predictor and found that the variable really have two
> categories:
> 
> > summary(popular$SEX)
>    0    1
> 1026  974
> 
> I use lme4 version lme4_0.999999-0.
> 
> Please teach me what is going on in this case. Thank you very much.
> 

I believe this is a weakness in the way that lme4 constructs
random effects.  The problem is that it falls back on R's standard
model-matrix constructor (model.matrix()); in this case the formula
~0+SEX considered by itself gives rise to a "no-intercept" matrix,
which is *not* a one-column model matrix, but rather two columns 
each corresponding to a dummy variable for the corresponding factor level.

For example:

d <- data.frame(SEX=factor(0:1))
model.matrix(~SEX,data=d)
##   (Intercept) SEX1
## 1           1    0
## 2           1    1

model.matrix(~0+SEX,data=d)
##   SEX0 SEX1
## 1    1    0
## 2    0    1

rather than the model matrix you want, which is just

##    SEX1
## 1     0
## 2     1

The workaround is (as you have done) to create your own dummy
variable.

The other disturbing part of this is that the model with (~0+SEX|SCHOOL)
is actually unidentifiable (I think), but lmer goes ahead and fits
something for you anyway, without warning you.

This will definitely be worth posting an issue at
https://github.com/lme4/lme4/issues?state=open : if I get a
chance I will do it, but you are encouraged to do so ...



More information about the R-sig-mixed-models mailing list