[R-sig-ME] how reliable are inferences drawn from binomial models for small datasets fitted with lme4?
Roger Levy
rlevy at ling.ucsd.edu
Mon Jul 6 02:06:56 CEST 2009
Dear Jarrod,
Many thanks for your thoughts on this! (more below)
On Jul 5, 2009, at 2:44 PM, Jarrod Hadfield wrote:
> Dear Roger,
>
> I think part of your problem is that in treatment 2 there are 60
> success but only a single failure.
Actually in the original data there are 61 successes and no failures.
In the imagined version of the data with 60 successes and one failure,
the lme4 and Bayesian results are more in agreement.
> This means that there will be little (no?) information in the data
> to estimate F1 and F2 variances that are specific to treatment 2.
> I suspect this explains the NaN, and also the discrepancy between
> JAGS and lmer: in the Bayesian analysis this information is coming
> completely from your prior, as are the between treatment
> covariances. I would recommend the simpler model:
>
> m1 <- lmer(Response ~ Treatment+(1 | F1)+(1 | F2), dat,
> family=binomial)
>
> The F2 variance is fixed at zero, and the treatment effect is
> significant at p=0.0268
>
> For this model the lmer and MCMCglmm results agree for the fixed
> effects. The variance components are sensitive to the prior however,
> but with weak priors the posterior distributions of the variances
> are very wide.
Ah, but the trouble is that there *does* seem to be a reasonable
amount of evidence for treatment-specific random effects of at least
one of F1 and F2. For example:
> print (m1 <- lmer(Response ~ Treatment + (1 | F1) + (1 | F2), dat,
family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ Treatment + (1 | F1) + (1 | F2)
Data: dat
AIC BIC logLik deviance
90.15 103.1 -41.08 82.15
Random effects:
Groups Name Variance Std.Dev.
F2 (Intercept) 0.0000 0.0000
F1 (Intercept) 3.4413 1.8551
Number of obs: 190, groups: F2, 24; F1, 16
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.9542 0.6277 4.706 2.52e-06 ***
Treatment2 2.6760 1.2083 2.215 0.0268 *
> print (m01 <- lmer(Response ~ Treatment + (Treatment - 1 | F1) +
(1 | F2), dat, family=binomial))
Generalized linear mixed model fit by the Laplace approximation
Formula: Response ~ Treatment + (Treatment - 1 | F1) + (1 | F2)
Data: dat
AIC BIC logLik deviance
87.58 107.1 -37.79 75.58
Random effects:
Groups Name Variance Std.Dev. Corr
F2 (Intercept) 1.6467e-12 1.2832e-06
F1 Treatment1 9.0240e+00 3.0040e+00
Treatment2 3.4832e+00 1.8663e+00 -1.000
Number of obs: 190, groups: F2, 24; F1, 16
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.8284 0.9857 3.884 0.000103 ***
Treatment2 1.3655 2.0034 0.682 0.495487
Going by the reported log-likelihoods and applying the reasoning of
Baayen, Bates, and Davidson, a likelihood ratio test should be
conservative and there so there is quite a bit of evidence for m01
over m00 above.
This leads back to the original issue, though. One intuition that I
have is that for the two models
Response ~ Treatment + (Treatment - 1 | F1) + (Treatment - 1 | F2)
Response ~ 1 + (Treatment - 1 | F1) + (Treatment - 1 | F2)
the log-likelihood should really have to be a fair bit better in the
former than in the latter model (more than a difference of 0.42!),
because whereas both models have to explain the observed proportions,
the latter model also has to be lucky enough for the random effects to
line up such that the observed proportions have a non-trivial
likelihood. That is, with the log-likelihood of interest being
f(y | \beta, \theta) = \int P(y | \beta, b) P(b | \sigma) db
the former model should have much more probability from P(b | \sigma)
in the region of b where P(y | \beta, b) is large enough to matter.
Best
Roger
> Quoting Roger Levy <rlevy at ling.ucsd.edu>:
>
>> This post may be of interest in light of the recent discussion of PQL
>> versus Laplace-approximated likelihood. I'm facing an interestingly
>> challenging analysis of a relatively small (190-observation)
>> binary-response dataset with a single two-level treatment and two
>> crossed random factors (call them F1 and F2). The question of
>> current
>> interest is whether I can infer a difference in fixed effect of
>> treatment; it is hard to figure out the right conclusions to draw.
>> I'm
>> putting the full dataset at the end of the message, but I'll start by
>> presenting some summary statistics:
>>
>>> xtabs(~ Treatment + Response, dat)
>> Response
>> Treatment 0 1
>> 1 12 117
>> 2 0 61
>>
>> Fisher's exact test says "yes, the treatment has an effect" at
>> p=0.01.
>> But of course that doesn't take into account the crossed hierarchical
>> structure of the data. Some statistics on this:
>>
>>> with(dat,tapply(Response, list(F1,Treatment),length))
>> 1 2
>> 1 11 5
>> 2 7 6
>> 3 9 3
>> 4 4 NA
>> 5 3 3
>> 6 11 NA
>> 7 3 3
>> 8 9 4
>> 9 11 3
>> 10 12 6
>> 11 11 8
>> 12 5 6
>> 13 1 NA
>> 14 11 1
>> 15 12 9
>> 16 9 4
>>> with(dat,tapply(Response, list(F1,Treatment),mean))
>> 1 2
>> 1 1.0000000 1
>> 2 0.5714286 1
>> 3 0.8888889 1
>> 4 1.0000000 NA
>> 5 0.0000000 1
>> 6 1.0000000 NA
>> 7 0.6666667 1
>> 8 1.0000000 1
>> 9 1.0000000 1
>> 10 1.0000000 1
>> 11 0.8181818 1
>> 12 0.6000000 1
>> 13 1.0000000 NA
>> 14 1.0000000 1
>> 15 1.0000000 1
>> 16 1.0000000 1
>>> with(dat,tapply(Response, list(F2,Treatment),length))
>> 1 2
>> 1 4 2
>> 2 8 2
>> 3 3 1
>> 4 5 2
>> 5 6 3
>> 6 7 2
>> 7 5 5
>> 8 8 1
>> 9 6 6
>> 10 5 2
>> 11 6 NA
>> 12 6 6
>> 13 4 1
>> 14 6 2
>> 15 4 6
>> 16 5 2
>> 17 5 2
>> 18 4 1
>> 19 5 NA
>> 20 4 NA
>> 21 5 6
>> 22 6 3
>> 23 6 5
>> 24 6 1
>>> with(dat,tapply(Response, list(F2,Treatment),mean))
>> 1 2
>> 1 1.0000000 1
>> 2 0.8750000 1
>> 3 1.0000000 1
>> 4 1.0000000 1
>> 5 1.0000000 1
>> 6 1.0000000 1
>> 7 0.8000000 1
>> 8 0.8750000 1
>> 9 0.8333333 1
>> 10 1.0000000 1
>> 11 0.8333333 NA
>> 12 0.6666667 1
>> 13 1.0000000 1
>> 14 0.8333333 1
>> 15 1.0000000 1
>> 16 1.0000000 1
>> 17 1.0000000 1
>> 18 1.0000000 1
>> 19 1.0000000 NA
>> 20 1.0000000 NA
>> 21 1.0000000 1
>> 22 0.6666667 1
>> 23 0.8333333 1
>> 24 0.8333333 1
>>
>> So there is some sign of inter-cluster variability, though of course
>> the binary response makes it hard to see. Crossed random-slope
>> models
>> converge both with and without a fixed effect of treatment...
>>
>>> print(m1 <- lmer(Response ~ Treatment + (Treatment - 1 | F1) +
>> (Treatment - 1 | F2), dat, family=binomial))
>> Generalized linear mixed model fit by the Laplace approximation
>> Formula: Response ~ Treatment + (Treatment - 1 | F1) + (Treatment -
>> 1 |
>> F2)
>> Data: dat
>> AIC BIC logLik deviance
>> 82.65 108.6 -33.33 66.65
>> Random effects:
>> Groups Name Variance Std.Dev. Corr
>> F2 Treatment1 4.1354e-12 2.0336e-06
>> Treatment2 1.0492e+00 1.0243e+00 0.000
>> F1 Treatment1 9.4589e+00 3.0755e+00
>> Treatment2 6.9945e-01 8.3633e-01 0.000
>> Number of obs: 190, groups: F2, 24; F1, 16
>>
>> Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) 3.943 1.004 3.929 8.54e-05 ***
>> Treatment2 17.289 5221.110 0.003 0.997
>>> print(m0 <- lmer(Response ~ 1 + (Treatment - 1 | F1) + (Treatment
>>> - 1
>> | F2), dat, family=binomial))
>> Generalized linear mixed model fit by the Laplace approximation
>> Formula: Response ~ 1 + (Treatment - 1 | F1) + (Treatment - 1 | F2)
>> Data: dat
>> AIC BIC logLik deviance
>> 81.5 104.2 -33.75 67.5
>> Random effects:
>> Groups Name Variance Std.Dev. Corr
>> F2 Treatment1 0.0000e+00 0.00000000
>> Treatment2 1.7654e-08 0.00013287 NaN
>> F1 Treatment1 2.3858e+01 4.88443481
>> Treatment2 3.0185e-02 0.17373716 -1.000
>> Number of obs: 190, groups: F2, 24; F1, 16
>>
>> Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>> (Intercept) 5.855 1.384 4.232 2.32e-05 ***
>>
>>
>> ...but I am concerned (a) about the NaN correlation for F2 (this must
>> be a sign of something broken somewhere!?), and (b) about the minimal
>> difference in log-likelihood for m1 versus m0. (Note that it's
>> possible to fix (a) by dropping random effects of F2 from m0, and the
>> log-likelihood doesn't change.) It seems hard to believe that the
>> marginal log-likelihoods of the models given estimates of the fixed
>> effects and of the random-effect covariance matrices differs by only
>> 0.42. I draw further evidence for this conclusion by imagining that
>> one response in Treatment 2 were a failure rather than a success:
>>
>>> dat[142,"Response"] <- 0
>>> logLik(lmer(Response ~ Treatment + (Treatment - 1 | F1) + (Treatment
>> - 1 | F2), dat, family=binomial))
>> 'log Lik.' -35.29494 (df=8)
>>> logLik(lmer(Response ~ 1 + (Treatment - 1 | F1) + (Treatment - 1 |
>> F2), dat, family=binomial))
>> 'log Lik.' -37.60635 (df=7)
>>
>> Now there's a considerable difference in log-likelihood; this would
>> turn out significant in a chi-squared test (though of course we'd
>> want
>> to calibrate with simulation if this were the actual dataset).
>>
>> Note that I can't use the Wald statistic here to address my
>> hypothesis
>> for the real dataset, because the lack of failure responses in
>> Treatment 2 blows up the MLE and also the standard error of its
>> parameter. I was hoping to use a likelihood-ratio comparison
>> (perhaps
>> calibrated with simulation), but it seems to me that the minimal
>> observed difference in likelihood ratio for the actual dataset throws
>> that idea out the window.
>>
>> So here are some questions:
>>
>> 1) Is there something specific to the Laplace approximation being
>> used
>> that renders logit models with extreme response proportions generally
>> unreliable? (I can't check this directly against other packages
>> because of the crossed random effects.)
>>
>> 2) Is this dataset simply small enough that the uncertainty in
>> estimating of the random-effect variance pararameters makes
>> point-estimate inference on fixed effects unreliable, and I should go
>> the Bayesian route and integrate over the uncertainty? Following
>> this
>> possibility, I implemented the fixed-effect-of-treatment model (m1)
>> in
>> JAGS with diffuse normal priors over the intercept and Treatment 2
>> parameter, and diffuse Wishart priors over the random-effect
>> covariance
>> matrices. Here are 95% Bayesian confidence intervals that result:
>>
>>
>> lower upper
>> Omega.F2[1,1] 5.316465e+02 5.922489e+05
>> Omega.F2[2,1] -3.016791e+05 2.899492e+05
>> Omega.F2[1,2] -3.016791e+05 2.899492e+05
>> Omega.F2[2,2] 2.116783e+02 5.911579e+05
>> Omega.F1[1,1] 5.186791e+01 5.979535e+05
>> Omega.F1[2,1] -3.073216e+05 2.898743e+05
>> Omega.F1[1,2] -3.073216e+05 2.898743e+05
>> Omega.F1[2,2] 2.764357e+02 5.972545e+05
>> intercept 1.749505e+00 2.938812e+00
>> Treatment2 1.240127e+00 6.279801e+02
>>
>> and here is the posterior mode:
>>
>> Omega.F2[1,1] 70299.472805
>> Omega.F2[2,1] -1324.272662
>> Omega.F2[1,2] -1324.272662
>> Omega.F2[2,2] 70810.193188
>> Omega.F1[1,1] 70426.234086
>> Omega.F1[2,1] -46977.070058
>> Omega.F1[1,2] -46977.070058
>> Omega.F1[2,2] 87987.091315
>> intercept 1.977808
>> Treatment2 5.846867
>>
>> Some notable differences between these Bayesian results and the
>> lme4-fit m1:
>>
>> * lme4 inferred considerably larger random effects of F1 than of F2,
>> which doesn't seem supported by the marginal means I listed near the
>> beginning of the email; whereas the Bayesian inference puts F1 and F2
>> on roughly equal footing;
>>
>> * the severe lack of evidence regarding the correlations of random
>> effects across treatment is clearly evident in the Bayesian
>> confidence
>> intervals;
>>
>> * the fixed-effects posterior modes are much closer to zero than the
>> Laplace-approximated MLEs;
>>
>> * The Bayesian inference seems strongly supportive of a fixed
>> effect of
>> treatment, whereas lme4 superficially seemed to lead to the opposite
>> conclusions.
>>
>> I'd love to hear people's thoughts on my approach to these data.
>> Many
>> thanks in advance.
>>
>> Roger
>>
>> **
>>
>> Treatment F1 F2 Response
>> 1 1 6 1 1
>> 2 1 7 1 1
>> 3 1 8 1 1
>> 4 1 15 1 1
>> 5 1 1 2 1
>> 6 1 2 2 1
>> 7 1 3 2 1
>> 8 1 4 2 1
>> 9 1 9 2 1
>> 10 1 10 2 1
>> 11 1 11 2 1
>> 12 1 12 2 0
>> 13 1 6 3 1
>> 14 1 14 3 1
>> 15 1 15 3 1
>> 16 1 3 4 1
>> 17 1 4 4 1
>> 18 1 9 4 1
>> 19 1 10 4 1
>> 20 1 11 4 1
>> 21 1 6 5 1
>> 22 1 8 5 1
>> 23 1 13 5 1
>> 24 1 14 5 1
>> 25 1 15 5 1
>> 26 1 16 5 1
>> 27 1 1 6 1
>> 28 1 2 6 1
>> 29 1 3 6 1
>> 30 1 9 6 1
>> 31 1 10 6 1
>> 32 1 11 6 1
>> 33 1 12 6 1
>> 34 1 5 7 0
>> 35 1 6 7 1
>> 36 1 14 7 1
>> 37 1 15 7 1
>> 38 1 16 7 1
>> 39 1 1 8 1
>> 40 1 2 8 1
>> 41 1 3 8 1
>> 42 1 4 8 1
>> 43 1 9 8 1
>> 44 1 10 8 1
>> 45 1 11 8 0
>> 46 1 12 8 1
>> 47 1 5 9 0
>> 48 1 7 9 1
>> 49 1 8 9 1
>> 50 1 14 9 1
>> 51 1 15 9 1
>> 52 1 16 9 1
>> 53 1 1 10 1
>> 54 1 3 10 1
>> 55 1 9 10 1
>> 56 1 10 10 1
>> 57 1 11 10 1
>> 58 1 5 11 0
>> 59 1 6 11 1
>> 60 1 8 11 1
>> 61 1 14 11 1
>> 62 1 15 11 1
>> 63 1 16 11 1
>> 64 1 1 12 1
>> 65 1 2 12 1
>> 66 1 9 12 1
>> 67 1 10 12 1
>> 68 1 11 12 0
>> 69 1 12 12 0
>> 70 1 6 13 1
>> 71 1 14 13 1
>> 72 1 15 13 1
>> 73 1 16 13 1
>> 74 1 1 14 1
>> 75 1 2 14 0
>> 76 1 3 14 1
>> 77 1 10 14 1
>> 78 1 11 14 1
>> 79 1 12 14 1
>> 80 1 6 15 1
>> 81 1 8 15 1
>> 82 1 14 15 1
>> 83 1 15 15 1
>> 84 1 1 16 1
>> 85 1 3 16 1
>> 86 1 4 16 1
>> 87 1 9 16 1
>> 88 1 10 16 1
>> 89 1 6 17 1
>> 90 1 8 17 1
>> 91 1 14 17 1
>> 92 1 15 17 1
>> 93 1 16 17 1
>> 94 1 1 18 1
>> 95 1 9 18 1
>> 96 1 10 18 1
>> 97 1 11 18 1
>> 98 1 6 19 1
>> 99 1 8 19 1
>> 100 1 14 19 1
>> 101 1 15 19 1
>> 102 1 16 19 1
>> 103 1 1 20 1
>> 104 1 9 20 1
>> 105 1 10 20 1
>> 106 1 11 20 1
>> 107 1 6 21 1
>> 108 1 8 21 1
>> 109 1 14 21 1
>> 110 1 15 21 1
>> 111 1 16 21 1
>> 112 1 1 22 1
>> 113 1 2 22 0
>> 114 1 3 22 0
>> 115 1 9 22 1
>> 116 1 10 22 1
>> 117 1 11 22 1
>> 118 1 6 23 1
>> 119 1 7 23 0
>> 120 1 8 23 1
>> 121 1 14 23 1
>> 122 1 15 23 1
>> 123 1 16 23 1
>> 124 1 1 24 1
>> 125 1 2 24 0
>> 126 1 3 24 1
>> 127 1 9 24 1
>> 128 1 10 24 1
>> 129 1 11 24 1
>> 130 2 3 1 1
>> 131 2 11 1 1
>> 132 2 15 2 1
>> 133 2 16 2 1
>> 134 2 2 3 1
>> 135 2 7 4 1
>> 136 2 15 4 1
>> 137 2 10 5 1
>> 138 2 11 5 1
>> 139 2 12 5 1
>> 140 2 8 6 1
>> 141 2 15 6 1
>> 142 2 1 7 0
>> 143 2 2 7 1
>> 144 2 10 7 1
>> 145 2 11 7 1
>> 146 2 12 7 1
>> 147 2 8 8 1
>> 148 2 1 9 1
>> 149 2 2 9 1
>> 150 2 9 9 1
>> 151 2 10 9 1
>> 152 2 11 9 1
>> 153 2 12 9 1
>> 154 2 5 10 1
>> 155 2 8 10 1
>> 156 2 5 12 1
>> 157 2 7 12 1
>> 158 2 8 12 1
>> 159 2 14 12 1
>> 160 2 15 12 1
>> 161 2 16 12 1
>> 162 2 1 13 1
>> 163 2 5 14 1
>> 164 2 15 14 1
>> 165 2 2 15 1
>> 166 2 3 15 1
>> 167 2 9 15 1
>> 168 2 10 15 1
>> 169 2 11 15 1
>> 170 2 12 15 1
>> 171 2 15 16 1
>> 172 2 16 16 1
>> 173 2 11 17 1
>> 174 2 12 17 1
>> 175 2 15 18 1
>> 176 2 1 21 1
>> 177 2 2 21 1
>> 178 2 3 21 1
>> 179 2 9 21 1
>> 180 2 10 21 1
>> 181 2 11 21 1
>> 182 2 7 22 1
>> 183 2 15 22 1
>> 184 2 16 22 1
>> 185 2 1 23 1
>> 186 2 2 23 1
>> 187 2 10 23 1
>> 188 2 11 23 1
>> 189 2 12 23 1
>> 190 2 15 24 1
>>
>> --
>>
>> Roger Levy Email: rlevy at ling.ucsd.edu
>> Assistant Professor Phone: 858-534-7219
>> Department of Linguistics Fax: 858-534-4789
>> UC San Diego Web: http://ling.ucsd.edu/~rlevy
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
--
Roger Levy Email: rlevy at ling.ucsd.edu
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://ling.ucsd.edu/~rlevy
More information about the R-sig-mixed-models
mailing list