[R-sig-ME] assessing fixed factor significance depending on reference levels ?
Ben Bolker
bbolker at gmail.com
Sun Mar 13 21:53:18 CET 2011
On 11-03-13 03:44 PM, Claudia Monica Campos wrote:
> Dear list,
>
> I'm trying to fit a GLMM to assess whether some category of species
> (native, mammal, bird, etc.)
> from the total named by each student can be explained by differences
> in the place of residence
> (urban or rural), gender and/or age.
>
> m1= lmer(cbind (n_a_bird,n_animals-n_a_bird) ~ sex*place+age+ (1 |
> school/grade), data=a,family=binomial)
> where:
> sex has two levels ('f' and 'm')
> place has two levels ('r' and 'u')
> age is numerical (from 7 to 18)
>
> As you can see from below, a$place fixed effect could be an
> explanatory variable,
> but it may be significant (its p-value) depending on a$sex ref level:
>
> a$sex<-relevel(a$sex,'f');a$place<-relevel(a$place,'r')
> m1_fr= lmer(cbind (n_a_bird,n_animals-n_a_bird) ~ place*sex+age+ (1 |
> school/grade), data=a,family=binomial)
> a$sex<-relevel(a$sex,'m');a$place<-relevel(a$place,'r')
> m1_mr= lmer(cbind (n_a_bird,n_animals-n_a_bird) ~ place*sex+age+ (1 |
> school/grade), data=a,family=binomial)
>
> summary(m1_fr) ### ref levels: sex:'f' , place:'r'
> Generalized linear mixed model fit by the Laplace approximation
> Formula: cbind(n_a_bird, n_animals - n_a_bird) ~ place * sex + age +
> (1 | school/grade)
> Data: a
> AIC BIC logLik deviance
> 2125 2163 -1055 2111
> Random effects:
> Groups Name Variance Std.Dev.
> grade:school (Intercept) 1.4745e-13 3.8399e-07
> school (Intercept) 1.6971e-02 1.3027e-01
> Number of obs: 1746, groups: grade:school, 51; school, 42
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.87093 0.15225 -12.289 < 2e-16 ***
> placeu 0.03351 0.07922 0.423 0.672331 <================
> sexm 0.16097 0.07476 2.153 0.031299 *
> age 0.02716 0.01099 2.472 0.013437 *
> placeu:sexm -0.32108 0.09478 -3.388 0.000705 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Correlation of Fixed Effects:
> (Intr) placeu sexm age
> placeu -0.518
> sexm -0.253 0.443
> age -0.918 0.237 0.028
> placeu:sexm 0.160 -0.537 -0.788 0.021
>
> summary(m1_mr) ### ref levels: sex:'m', place: 'r'
> Generalized linear mixed model fit by the Laplace approximation
> Formula: cbind(n_a_bird, n_animals - n_a_bird) ~ place * sex + age +
> (1 | school/grade)
> Data: a
> AIC BIC logLik deviance
> 2125 2163 -1055 2111
> Random effects:
> Groups Name Variance Std.Dev.
> grade:school (Intercept) 3.5201e-13 5.9330e-07
> school (Intercept) 1.6971e-02 1.3027e-01
> Number of obs: 1746, groups: grade:school, 51; school, 42
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -1.70995 0.15171 -11.271 < 2e-16 ***
> placeu -0.28754 0.08484 -3.389 0.000701 *** <================
> sexf -0.16097 0.07476 -2.153 0.031298 *
> age 0.02716 0.01099 2.472 0.013438 *
> placeu:sexf 0.32102 0.09478 3.387 0.000706 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Correlation of Fixed Effects:
> (Intr) placeu sexf age
> placeu -0.536
> sexf -0.239 0.467
> age -0.908 0.245 -0.028
> placeu:sexf 0.228 -0.616 -0.788 -0.021
>
> Doing a LRT, by removing a$place seems to show it's indeed significant:
>> m0=lmer(cbind (n_a_bird,n_animals-n_a_bird) ~ sex+age+ (1 | school/grade), data=a,family=binomial, REML=F)
>> m1= lmer(cbind (n_a_bird,n_animals-n_a_bird) ~ place*sex+age+ (1 | school/grade), data=a,family=binomial, REML=F)
>> anova(m0,m1)
> Data: a
> Models:
> m0: cbind(n_a_bird, n_animals - n_a_bird) ~ sex + age + (1 | school/grade)
> m1: cbind(n_a_bird, n_animals - n_a_bird) ~ place * sex + age + (1 |
> m1: school/grade)
> Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
> m0 5 2134.7 2162.0 -1062.3
> m1 7 2124.8 2163.1 -1055.4 13.808 2 0.001004 **
> ---
>
> How should I proceed with the model selection ?
>
> To properly understand if a$place alone or its interaction with a$sex
> is significant,
> do I need to fit the model with different relevel-ing (in a
> combinatorial way ) ?
> Ie: if you see above "placeu", you'll find that shows significant for sex='m' as
> reference but not for sex='f'.
>
What is your goal?
The fact that there is a highly significant interaction means that the
effect of 'urban' differs depending whether you consider males or
females (and similarly for the effect of sex).
You can interpret the models as follows:
fr: for females, urban is not sig. diff. from rural
for rural, male is sig. diff. from female
mr: for males, urban is sig. diff. from rural
for rural, female is sig. diff. from male (same estimate as in model fr)
There is a school of thought (strongly represented in the R
community) that says that interpreting main effects in the presence of
interactions, especially significant interactions, just doesn't make
sense: see Venables' "Exegeses on linear models" (google it).
A more traditional school of thought would look at marginal effects
via the dreaded "type III sums of squares", i.e. looking at the
*average* effect of rural vs urban across males and females and vice
versa. Again, it is arguable whether you want to do this or not,
but setting sum contrasts makes it possible.
I would say that the most sensible thing to do, if you really want to
test the significance of the main effect of place, is to subset your
data and do the test separately for males and females.
lmer(cbind (n_a_bird,n_animals-n_a_bird) ~ place+age+
(1 | school/grade), data=a,subset=(sex=="male"), family=binomial)
and similarly for sex=="female"
<http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html#Why-does-the-output-from-anova_0028_0029-depend-on-the-order-of-factors-in-the-model_003f>
More information about the R-sig-mixed-models
mailing list