[R-sig-ME] anova() and the difference between (x | y) and (1 | y:x) in lme4
Hans Ekbrand
hans.ekbrand at gmail.com
Wed Jun 11 15:58:10 CEST 2014
Dear list,
I have a question about the difference between
y ~ (1 | var2:var1) vs y ~ (var1 | var2).
In reality my model is more complex:
y ~ 1 + var1 + (1 | var2:var1) + var3+ .... + var9
vs
y ~ 1 + var1 + (var1 | var2) + var3+ .... + var9
Following the advice kindly given by Reinhold Kliegl way back ago
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/016545.html)
I have used the following specification with glmer() in lme4 (version 1.1-7):
fit.flat <- glmer(below.poverty.line ~ 1 + employment.type + (1 | country:employment.type) + gender + age + age.2 + n.adults.minus.n.children + n.children + education + household.type, family = binomial("logit"), data = my.df)
and
fit.hierarchical <- glmer(below.poverty.line ~ 1 + employment.type + (employment.type | country) + gender + age + age.2 + n.adults.minus.n.children + n.children + education + household.type, family = binomial("logit"), data = my.df)
Info on the data:
str(my.df)
'data.frame': 93178 obs. of 10 variables:
$ below.poverty.line : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ employment.type : Factor w/ 6 levels "Core labour force",..: 1 5 1 1 1 5 5 5 1 1 ...
$ country : Factor w/ 22 levels "austria","belgium",..: 1 1 1 1 1 1 1 1 1 1 ...
$ gender : Factor w/ 2 levels "female","male": 2 1 2 2 1 1 1 1 2 2 ...
$ age : num 22 22 32 56 40 54 42 18 49 20 ...
$ age.2 : num 3.39e-02 3.39e-02 7.08e-03 2.43e-02 1.71e-05 ...
$ n.adults.minus.n.children: num 3 3 1 5 2 2 3 5 5 5 ...
$ n.children : num 1 1 2 0 1 0 1 0 0 0 ...
$ education : Factor w/ 4 levels "primary","lower secondary",..: 2 2 4 2 3 4 2 2 2 3 ...
$ household.type : Factor w/ 4 levels "couple without children",..: 2 2 3 1 3 4 3 4 1 4 ...
If you want to replicate the analysis - or inspect the data - try this:
load(url("http://hansekbrand.se/code/my.df.RData"))
The total computation time for both models is about one hour on my computer.
My primary question is whether or not anova() is usable to choose between the two models?
Data: my.df
Models:
fit.flat: below.poverty.line ~ 1 + employment.type + (1 | country:employment.type) +
fit.flat: gender + age + age.2 + n.adults.minus.n.children + n.children +
fit.flat: education + household.type
fit.hierarchical: below.poverty.line ~ 1 + employment.type + (employment.type |
fit.hierarchical: country) + gender + age + age.2 + n.adults.minus.n.children +
fit.hierarchical: n.children + education + household.type
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
fit.flat 18 38852 39022 -19408 38816
fit.hierarchical 38 38804 39163 -19364 38728 88.082 20 1.602e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
My second question is whether or not I should care about the warnings
I get (not entirely sure which one belongs to which model, but the
first one should be against fit.hierarchcial).
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00636715 (tol = 0.001, component 30)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?
summary(fit.flat)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: below.poverty.line ~ 1 + employment.type + (1 | country:employment.type) +
gender + age + age.2 + n.adults.minus.n.children + n.children + education + household.type
Data: my.df
AIC BIC logLik deviance df.resid
38852.1 39022.1 -19408.1 38816.1 93160
Scaled residuals:
Min 1Q Median 3Q Max
-1.5785 -0.2741 -0.1841 -0.1240 14.1696
Random effects:
Groups Name Variance Std.Dev.
country:employment.type (Intercept) 0.301 0.5486
Number of obs: 93178, groups: country:employment.type, 132
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.821530 0.164883 -17.112 < 2e-16 ***
employment.typeCore self-employed 1.779764 0.177173 10.045 < 2e-16 ***
employment.typeInto core labour force 0.873362 0.183095 4.770 1.84e-06 ***
employment.typeMarginalized peripheral labour force 1.791760 0.185840 9.641 < 2e-16 ***
employment.typePeripheral labour force 1.036154 0.175026 5.920 3.22e-09 ***
employment.typePeripheral self-employed 1.699013 0.180444 9.416 < 2e-16 ***
gendermale 0.152666 0.029487 5.177 2.25e-07 ***
age -0.008906 0.001537 -5.794 6.86e-09 ***
age.2 -3.647558 1.044310 -3.493 0.000478 ***
n.adults.minus.n.children 0.034069 0.010769 3.164 0.001559 **
n.children 0.258188 0.028628 9.019 < 2e-16 ***
educationlower secondary -0.399377 0.051611 -7.738 1.01e-14 ***
educationupper secondary -0.902910 0.049323 -18.306 < 2e-16 ***
educationpost secondary -1.582793 0.056489 -28.019 < 2e-16 ***
household.typecouple with children -0.120115 0.058652 -2.048 0.040568 *
household.typesingle adult with children 0.514623 0.069323 7.424 1.14e-13 ***
household.typesingle adult without children 0.195389 0.041295 4.732 2.23e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
summary(fit.hierarchical)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: below.poverty.line ~ 1 + employment.type + (employment.type |
country) + gender + age + age.2 + n.adults.minus.n.children + n.children + education + household.type
Data: my.df
AIC BIC logLik deviance df.resid
38804.0 39162.8 -19364.0 38728.0 93140
Scaled residuals:
Min 1Q Median 3Q Max
-1.5710 -0.2728 -0.1835 -0.1243 14.1030
Random effects:
Groups Name Variance Std.Dev. Corr
country (Intercept) 0.2204 0.4695
employment.typeCore self-employed 0.4457 0.6676 -0.20
employment.typeInto core labour force 0.3922 0.6263 -0.35 0.44
employment.typeMarginalized peripheral labour force 0.1228 0.3504 -0.63 0.57 0.39
employment.typePeripheral labour force 0.1090 0.3301 -0.38 0.24 0.70 0.66
employment.typePeripheral self-employed 0.3823 0.6183 -0.32 0.85 0.82 0.66 0.65
Number of obs: 93178, groups: country, 22
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.817822 0.153843 -18.316 < 2e-16 ***
employment.typeCore self-employed 1.741686 0.156367 11.138 < 2e-16 ***
employment.typeInto core labour force 0.847817 0.156475 5.418 6.02e-08 ***
employment.typeMarginalized peripheral labour force 1.771534 0.110705 16.002 < 2e-16 ***
employment.typePeripheral labour force 1.021857 0.090266 11.321 < 2e-16 ***
employment.typePeripheral self-employed 1.636287 0.151647 10.790 < 2e-16 ***
gendermale 0.153356 0.029485 5.201 1.98e-07 ***
age -0.008938 0.001540 -5.805 6.44e-09 ***
age.2 -3.629799 1.103239 -3.290 0.00100 **
n.adults.minus.n.children 0.035107 0.010791 3.253 0.00114 **
n.children 0.257672 0.028595 9.011 < 2e-16 ***
educationlower secondary -0.403009 0.051870 -7.770 7.87e-15 ***
educationupper secondary -0.899745 0.049781 -18.074 < 2e-16 ***
educationpost secondary -1.584911 0.056888 -27.860 < 2e-16 ***
household.typecouple with children -0.117651 0.058688 -2.005 0.04500 *
household.typesingle adult with children 0.512608 0.069332 7.393 1.43e-13 ***
household.typesingle adult without children 0.197551 0.041314 4.782 1.74e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
More information about the R-sig-mixed-models
mailing list