[R-sig-ME] how reliable are inferences drawn from binomial models for small datasets fitted with lme4?
Roger Levy
rlevy at ling.ucsd.edu
Sun Jul 5 21:51:05 CEST 2009
This post may be of interest in light of the recent discussion of PQL
versus Laplace-approximated likelihood. I'm facing an interestingly
challenging analysis of a relatively small (190-observation) binary-
response dataset with a single two-level treatment and two crossed
random factors (call them F1 and F2). The question of current
interest is whether I can infer a difference in fixed effect of
treatment; it is hard to figure out the right conclusions to draw.
I'm putting the full dataset at the end of the message, but I'll start
by presenting some summary statistics:
> xtabs(~ Treatment + Response, dat)
Response
Treatment 0 1
1 12 117
2 0 61
Fisher's exact test says "yes, the treatment has an effect" at
p=0.01. But of course that doesn't take into account the crossed
hierarchical structure of the data. Some statistics on this:
> with(dat,tapply(Response, list(F1,Treatment),length))
1 2
1 11 5
2 7 6
3 9 3
4 4 NA
5 3 3
6 11 NA
7 3 3
8 9 4
9 11 3
10 12 6
11 11 8
12 5 6
13 1 NA
14 11 1
15 12 9
16 9 4
> with(dat,tapply(Response, list(F1,Treatment),mean))
1 2
1 1.0000000 1
2 0.5714286 1
3 0.8888889 1
4 1.0000000 NA
5 0.0000000 1
6 1.0000000 NA
7 0.6666667 1
8 1.0000000 1
9 1.0000000 1
10 1.0000000 1
11 0.8181818 1
12 0.6000000 1
13 1.0000000 NA
14 1.0000000 1
15 1.0000000 1
16 1.0000000 1
> with(dat,tapply(Response, list(F2,Treatment),length))
1 2
1 4 2
2 8 2
3 3 1
4 5 2
5 6 3
6 7 2
7 5 5
8 8 1
9 6 6
10 5 2
11 6 NA
12 6 6
13 4 1
14 6 2
15 4 6
16 5 2
17 5 2
18 4 1
19 5 NA
20 4 NA
21 5 6
22 6 3
23 6 5
24 6 1
> with(dat,tapply(Response, list(F2,Treatment),mean))
1 2
1 1.0000000 1
2 0.8750000 1
3 1.0000000 1
4 1.0000000 1
5 1.0000000 1
6 1.0000000 1
7 0.8000000 1
8 0.8750000 1
9 0.8333333 1
10 1.0000000 1
11 0.8333333 NA
12 0.6666667 1
13 1.0000000 1
14 0.8333333 1
15 1.0000000 1
16 1.0000000 1
17 1.0000000 1
18 1.0000000 1
19 1.0000000 NA
20 1.0000000 NA
21 1.0000000 1
22 0.6666667 1
23 0.8333333 1
24 0.8333333 1
So there is some sign of inter-cluster variability, though of course
the binary response makes it hard to see. Crossed random-slope models
converge both with and without a fixed effect of treatment...
> print(m1 <- lmer(Response ~ Treatment + (Treatment - 1 | F1) +
(Treatment - 1 | F2), dat, family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ Treatment + (Treatment - 1 | F1) + (Treatment - 1
| F2)
Data: dat
AIC BIC logLik deviance
82.65 108.6 -33.33 66.65
Random effects:
Groups Name Variance Std.Dev. Corr
F2 Treatment1 4.1354e-12 2.0336e-06
Treatment2 1.0492e+00 1.0243e+00 0.000
F1 Treatment1 9.4589e+00 3.0755e+00
Treatment2 6.9945e-01 8.3633e-01 0.000
Number of obs: 190, groups: F2, 24; F1, 16
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.943 1.004 3.929 8.54e-05 ***
Treatment2 17.289 5221.110 0.003 0.997
> print(m0 <- lmer(Response ~ 1 + (Treatment - 1 | F1) + (Treatment -
1 | F2), dat, family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ 1 + (Treatment - 1 | F1) + (Treatment - 1 | F2)
Data: dat
AIC BIC logLik deviance
81.5 104.2 -33.75 67.5
Random effects:
Groups Name Variance Std.Dev. Corr
F2 Treatment1 0.0000e+00 0.00000000
Treatment2 1.7654e-08 0.00013287 NaN
F1 Treatment1 2.3858e+01 4.88443481
Treatment2 3.0185e-02 0.17373716 -1.000
Number of obs: 190, groups: F2, 24; F1, 16
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.855 1.384 4.232 2.32e-05 ***
...but I am concerned (a) about the NaN correlation for F2 (this must
be a sign of something broken somewhere!?), and (b) about the minimal
difference in log-likelihood for m1 versus m0. (Note that it's
possible to fix (a) by dropping random effects of F2 from m0, and the
log-likelihood doesn't change.) It seems hard to believe that the
marginal log-likelihoods of the models given estimates of the fixed
effects and of the random-effect covariance matrices differs by only
0.42. I draw further evidence for this conclusion by imagining that
one response in Treatment 2 were a failure rather than a success:
> dat[142,"Response"] <- 0
> logLik(lmer(Response ~ Treatment + (Treatment - 1 | F1) +
(Treatment - 1 | F2), dat, family=binomial))
'log Lik.' -35.29494 (df=8)
> logLik(lmer(Response ~ 1 + (Treatment - 1 | F1) + (Treatment - 1 |
F2), dat, family=binomial))
'log Lik.' -37.60635 (df=7)
Now there's a considerable difference in log-likelihood; this would
turn out significant in a chi-squared test (though of course we'd want
to calibrate with simulation if this were the actual dataset).
Note that I can't use the Wald statistic here to address my hypothesis
for the real dataset, because the lack of failure responses in
Treatment 2 blows up the MLE and also the standard error of its
parameter. I was hoping to use a likelihood-ratio comparison (perhaps
calibrated with simulation), but it seems to me that the minimal
observed difference in likelihood ratio for the actual dataset throws
that idea out the window.
So here are some questions:
1) Is there something specific to the Laplace approximation being used
that renders logit models with extreme response proportions generally
unreliable? (I can't check this directly against other packages
because of the crossed random effects.)
2) Is this dataset simply small enough that the uncertainty in
estimating of the random-effect variance pararameters makes point-
estimate inference on fixed effects unreliable, and I should go the
Bayesian route and integrate over the uncertainty? Following this
possibility, I implemented the fixed-effect-of-treatment model (m1) in
JAGS with diffuse normal priors over the intercept and Treatment 2
parameter, and diffuse Wishart priors over the random-effect
covariance matrices. Here are 95% Bayesian confidence intervals that
result:
lower upper
Omega.F2[1,1] 5.316465e+02 5.922489e+05
Omega.F2[2,1] -3.016791e+05 2.899492e+05
Omega.F2[1,2] -3.016791e+05 2.899492e+05
Omega.F2[2,2] 2.116783e+02 5.911579e+05
Omega.F1[1,1] 5.186791e+01 5.979535e+05
Omega.F1[2,1] -3.073216e+05 2.898743e+05
Omega.F1[1,2] -3.073216e+05 2.898743e+05
Omega.F1[2,2] 2.764357e+02 5.972545e+05
intercept 1.749505e+00 2.938812e+00
Treatment2 1.240127e+00 6.279801e+02
and here is the posterior mode:
Omega.F2[1,1] 70299.472805
Omega.F2[2,1] -1324.272662
Omega.F2[1,2] -1324.272662
Omega.F2[2,2] 70810.193188
Omega.F1[1,1] 70426.234086
Omega.F1[2,1] -46977.070058
Omega.F1[1,2] -46977.070058
Omega.F1[2,2] 87987.091315
intercept 1.977808
Treatment2 5.846867
Some notable differences between these Bayesian results and the lme4-
fit m1:
* lme4 inferred considerably larger random effects of F1 than of F2,
which doesn't seem supported by the marginal means I listed near the
beginning of the email; whereas the Bayesian inference puts F1 and F2
on roughly equal footing;
* the severe lack of evidence regarding the correlations of random
effects across treatment is clearly evident in the Bayesian confidence
intervals;
* the fixed-effects posterior modes are much closer to zero than the
Laplace-approximated MLEs;
* The Bayesian inference seems strongly supportive of a fixed effect
of treatment, whereas lme4 superficially seemed to lead to the
opposite conclusions.
I'd love to hear people's thoughts on my approach to these data. Many
thanks in advance.
Roger
**
Treatment F1 F2 Response
1 1 6 1 1
2 1 7 1 1
3 1 8 1 1
4 1 15 1 1
5 1 1 2 1
6 1 2 2 1
7 1 3 2 1
8 1 4 2 1
9 1 9 2 1
10 1 10 2 1
11 1 11 2 1
12 1 12 2 0
13 1 6 3 1
14 1 14 3 1
15 1 15 3 1
16 1 3 4 1
17 1 4 4 1
18 1 9 4 1
19 1 10 4 1
20 1 11 4 1
21 1 6 5 1
22 1 8 5 1
23 1 13 5 1
24 1 14 5 1
25 1 15 5 1
26 1 16 5 1
27 1 1 6 1
28 1 2 6 1
29 1 3 6 1
30 1 9 6 1
31 1 10 6 1
32 1 11 6 1
33 1 12 6 1
34 1 5 7 0
35 1 6 7 1
36 1 14 7 1
37 1 15 7 1
38 1 16 7 1
39 1 1 8 1
40 1 2 8 1
41 1 3 8 1
42 1 4 8 1
43 1 9 8 1
44 1 10 8 1
45 1 11 8 0
46 1 12 8 1
47 1 5 9 0
48 1 7 9 1
49 1 8 9 1
50 1 14 9 1
51 1 15 9 1
52 1 16 9 1
53 1 1 10 1
54 1 3 10 1
55 1 9 10 1
56 1 10 10 1
57 1 11 10 1
58 1 5 11 0
59 1 6 11 1
60 1 8 11 1
61 1 14 11 1
62 1 15 11 1
63 1 16 11 1
64 1 1 12 1
65 1 2 12 1
66 1 9 12 1
67 1 10 12 1
68 1 11 12 0
69 1 12 12 0
70 1 6 13 1
71 1 14 13 1
72 1 15 13 1
73 1 16 13 1
74 1 1 14 1
75 1 2 14 0
76 1 3 14 1
77 1 10 14 1
78 1 11 14 1
79 1 12 14 1
80 1 6 15 1
81 1 8 15 1
82 1 14 15 1
83 1 15 15 1
84 1 1 16 1
85 1 3 16 1
86 1 4 16 1
87 1 9 16 1
88 1 10 16 1
89 1 6 17 1
90 1 8 17 1
91 1 14 17 1
92 1 15 17 1
93 1 16 17 1
94 1 1 18 1
95 1 9 18 1
96 1 10 18 1
97 1 11 18 1
98 1 6 19 1
99 1 8 19 1
100 1 14 19 1
101 1 15 19 1
102 1 16 19 1
103 1 1 20 1
104 1 9 20 1
105 1 10 20 1
106 1 11 20 1
107 1 6 21 1
108 1 8 21 1
109 1 14 21 1
110 1 15 21 1
111 1 16 21 1
112 1 1 22 1
113 1 2 22 0
114 1 3 22 0
115 1 9 22 1
116 1 10 22 1
117 1 11 22 1
118 1 6 23 1
119 1 7 23 0
120 1 8 23 1
121 1 14 23 1
122 1 15 23 1
123 1 16 23 1
124 1 1 24 1
125 1 2 24 0
126 1 3 24 1
127 1 9 24 1
128 1 10 24 1
129 1 11 24 1
130 2 3 1 1
131 2 11 1 1
132 2 15 2 1
133 2 16 2 1
134 2 2 3 1
135 2 7 4 1
136 2 15 4 1
137 2 10 5 1
138 2 11 5 1
139 2 12 5 1
140 2 8 6 1
141 2 15 6 1
142 2 1 7 0
143 2 2 7 1
144 2 10 7 1
145 2 11 7 1
146 2 12 7 1
147 2 8 8 1
148 2 1 9 1
149 2 2 9 1
150 2 9 9 1
151 2 10 9 1
152 2 11 9 1
153 2 12 9 1
154 2 5 10 1
155 2 8 10 1
156 2 5 12 1
157 2 7 12 1
158 2 8 12 1
159 2 14 12 1
160 2 15 12 1
161 2 16 12 1
162 2 1 13 1
163 2 5 14 1
164 2 15 14 1
165 2 2 15 1
166 2 3 15 1
167 2 9 15 1
168 2 10 15 1
169 2 11 15 1
170 2 12 15 1
171 2 15 16 1
172 2 16 16 1
173 2 11 17 1
174 2 12 17 1
175 2 15 18 1
176 2 1 21 1
177 2 2 21 1
178 2 3 21 1
179 2 9 21 1
180 2 10 21 1
181 2 11 21 1
182 2 7 22 1
183 2 15 22 1
184 2 16 22 1
185 2 1 23 1
186 2 2 23 1
187 2 10 23 1
188 2 11 23 1
189 2 12 23 1
190 2 15 24 1
--
Roger Levy Email: rlevy at ling.ucsd.edu
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://ling.ucsd.edu/~rlevy
More information about the R-sig-mixed-models
mailing list