[R-sig-ME] Getting the right uncertainties when fitting multilevel models
Jonathan Judge
bachlaw01 at outlook.com
Mon Sep 25 04:07:34 CEST 2017
Not entirely sure what you are looking for but I believe the example just extrapolates from the random effects having a normal default distribution and adds an interval of 2 SDs to the county intercept plus underlying population coefficient estimate to get the likely range for a particular county.
Better ways to approximate this confidence interval now would be to either use merTools (which extends the arm::sim function to simulate intervals for mixed model parameters / intercepts) or bootMer, which gives similar approximations in a more robust way (and often provides narrower / more confident estimates in my experience). Any true Bayesian method, such as the stan_glmer function, will usually provide even more robust estimates, as Gelman indicates, and return intervals by default.
As always, sample size is important. The larger the sample, the less that the difference in method matters. IIRC, the radon data set is sufficiently small that incorporating uncertainty into the model is probably beneficial.
I suspect Gelman’s point is that the rstanarm package is now sufficiently comparable to lme4 for ease of use that he will start with rstanarm as his gateway drug (instead of lme4) and then move to Stan code from there. But I guess we’ll have to see.
Best,
Jonathan
On Sep 24, 2017, at 7:59 PM, Poe, John <jdpo223 at g.uky.edu<mailto:jdpo223 at g.uky.edu>> wrote:
I'm just wondering if anyone who knows more about lme4 than i do has any
comments on the Gelman post today. He's pretty vague and brief in his
reply.
Cesare Aloisi writes:
I am writing you regarding something I recently stumbled upon in your book
Data Analysis Using Regression and Multilevel/Hierarchical Models which
confused me, in hopes you could help me understand it. This book has been
my reference guide for many years now, and I am extremely grateful for
everything I learnt from you.
On page 261, a 95% confidence interval for the intercept in a certain group
(County 26) is calculated using only the standard error of the “random
effect” (the county-level error). The string is as follows:
coef(M1)$county[26,1] + c(-2,2)*se.ranef(M1)$county[26]
My understanding is that, since the group-level prediction (call it y.hat_j
= coef(M1)$county[26,1]) is a linear combination of a global average and a
group-level deviation from the average (y.hat_j = beta_0 + eta_j), then the
variance of y.hat_j should be the sum of the covariances of beta_0 and
eta_j, not just the variance of eta_j, as the code on page 261 seems to
imply. In other words:
Var(y.hat_j) = Var(beta_0) + Var(eta_j) + 2Cov(beta_0, eta_j)
Admittedly, lme4 does not provide an estimate for the last term, the
covariance between “fixed” and “random” effects. Was the code used in the
book to simplify the calculations, or was there some deeper reason to it
that I failed to grasp?
My (Gelman's) reply: The short answer is that it’s difficult to get this
correct in lmer but very easy when using stan_lmer() in the rstanarm
package. That’s what I recommend, and that’s what we’ll be doing in the 2nd
edition of our book
http://andrewgelman.com/2017/09/23/getting-right-uncertainties-fitting-multilevel-models/
[[alternative HTML version deleted]]
_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list