[R-sig-ME] Most principled reporting of mixed-effect model regression coefficients
p@u|@john@on @end|ng |rom g|@@gow@@c@uk
Wed Feb 26 10:28:30 CET 2020
Jon Lefcheck's piecewiseSEM package has a good R-squared function (piecewiseSEM::rsquared) that implements the N&S marginal and conditional R-squared for GLMMs, and includes the extension to random slopes. This is the best supported and most up-to-date implementation that I’m aware of. There are other packages I’m less familiar with (e.g. r2glmm).
All the best,
> On 26 Feb 2020, at 05:19, Ades, James <jades using health.ucsd.edu> wrote:
> Thanks, Daniel and Maarten!
> I looked at both Nakagawa and Schielzeth and the Johnson paper; I also looked through your other references...thanks for those. I really liked the linked Stack Exchange post of WHuber's lucid response to R^2.
> Johnson references the MuMIn package, which I wasn't familiar with, though he writes that the function "r.squaredGLMM" takes into account the random slope (something that N & S mention as tedious and then wave aside). Using the N&S equation, for one of my models, I get an R^2 of .35, while using r.squaredGLMM, I get an R^2 of .43. I can't imagine that the random slope of time would make that big of a difference. (The conditional R^2 is .95, and I have no idea how it's that high). Does anyone have any experience with the package?
> While some models (not for model selection but looking at PCA, individual variables, or some kind of aggregate measure for executive function) have comparatively large differences in AIC; using R^2 via MuMIn, they might have differences of .01. In other words, what seemed to be decent (and significant with LRT) differences, with r.squaredGLMM they became inconsequential.
> AIC seems to do a commendable job of yielding parsimony, but it's utter lack of comparability (with same # of observations) is frustrating. While an AIC of 28,620 is better than one with 28,645, there is, to my knowledge, no real way of quantifying that difference. Alas, while WHuber writes, "Most of the time you can find a better statistic than R^2. For model selection you can look to AIC and BIC," I think the
> issue is not only in selecting models (which AIC seems to do quite well), but again, in
> summarizing those models in intuitively quantitative ways.
> I've also looked into doing some kind of multiple time series cross validation
> though from what I've read (see below), this is similarly fraught. Maybe leave one out is
> the best way to go. The structure of the data has four timepoints with executive function
> data. The first two timepoints ('17 school year) and the final two timepoints ('18 school year)
> correspond to each year's standardized test.
> Thanks much!
> Di culty of selecting among multilevel models using predictive accuracy<http://www.stat.columbia.edu/~gelman/research/published/final_sub.pdf>
> Statistics and Its Interface Volume 7 (2014) 1 Di culty of selecting among multilevel models using predictive accuracy Wei Wang and Andrew Gelman
> On the use of cross-validation for time series predictor evaluation | Information Sciences: an International Journal<https://dl.acm.org/doi/10.1016/j.ins.2011.12.028>
> In time series predictor evaluation, we observe that with respect to the model selection procedure there is a gap between evaluation of traditional forecasting procedures, on the one hand, and evaluation of machine learning techniques on the other hand.
> Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure - Roberts - 2017 - Ecography - Wiley Online Library<https://onlinelibrary.wiley.com/doi/full/10.1111/ecog.02881>
> Ideally, model validation, selection, and predictive errors should be calculated using independent data (Araújo et al. 2005).For example, validation may be undertaken with data from different geographic regions or spatially distinct subsets of the region, different time periods, such as historic species records from the recent past or from fossil records.
> From: Maarten Jung <Maarten.Jung using mailbox.tu-dresden.de>
> Sent: Monday, February 17, 2020 1:35 AM
> To: Ades, James <jades using health.ucsd.edu>
> Cc: r-sig-mixed-models using r-project.org <r-sig-mixed-models using r-project.org>
> Subject: Re: [R-sig-ME] Most principled reporting of mixed-effect model regression coefficients
>> Thanks, Maarten. So I was planning on reporting R^2 (along with AIC) for the overall model fit, not for each predictor, since the regression coefficients themselves give a good indication of relationship (though I wasn't aware that R^2 is "riddled with complications") Is Henrik only saying this only with regard to LMMs and GLMMs?
> That makes sense to me. For the overall model fit I would probably
> still go with Johnson's version  which I describe in my
> StackExchange post (and I think you mentioned it, or the Nakagawa and
> Schielzeth version it is based on, earlier) and report both the
> marginal and conditional R^2 values. The regression coefficients
> provide unstandardized effect sizes on the response scale which I
> think are a valid way to report effect sizes (see below).
> I think Henrik refers to (G)LMMs and gives Rights & Sterba (2019) 
> as reference. Also, the GLMM FAQ website provides a good overview .
>> When you say "there is no agreed upon way to calculate effect sizes" I'm a little confused. I read through your stack exchange posting, but Henrik's answer refers to standardized effect size. You write, later down, "Whenever possible, we report unstandardized effect sizes which is in line with general recommendation of how to report effect sizes"
> What you cite is still Henrik's opinion (and I hoped that I could make
> this clear by writing "This is what he suggests [...]" and by using
> the <blockquote> on StackExchange). And your citation still refers to
> LMMs as he says "Unfortunately, due to the way that variance is
> partitioned in linear mixed models (e.g., Rights & Sterba, 2019),
> there does not exist an agreed upon way to calculate standard effect
> sizes for individual model terms such as main effects or
> In general, I agree with him and with his recommendation to report
> unstandardized effect sizes (e.g. regression coefficients) if they
> have a "meaningful" interpretation.
> The semi-partial R^2 I mentioned in my last e-mail is an
> additional/alternative indicator of effect sizes that is probably more
> in line with what psychologists are used to see reported in papers
> (especially when results of factorial designs are reported) - and
> that's the reason I mentioned it.
>> I'm also working on a systematic review where there's disagreement over whether effect sizes should be standardized, but it does seem that yield any kind of meaningful comparison, effect sizes would have to be standardized. I don't usually report standardized effect sizes...however, there are times when I z-score IVs to put them on the same scale, and I guess the output of that would be a standardized effect size. I wasn't aware of push back on that practice. What issues would arise from this?
> There is nothing wrong with standardizing (e.g. by diving by 1 or 2
> standard deviations) predictor variables to get measures of variable
> importance (within the same model).
> Issues arise when standardized effect sizes such as R^2, partial
> eta^2, etc. between different models are compared without thinking
> about what differences in these measures can be attributed to (see
> e.g. this question  or the Pek & Flora (2018) paper  that Henrik
> cites). Note that these are general issues that apply to all
> regression models, not only mixed models.
>  https://doi.org/10.1111/2041-210X.12225
>  https://doi.org/10.1037/met0000184
>  https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#how-do-i-compute-a-coefficient-of-determination-r2-or-an-analogue-for-glmms
>  https://stats.stackexchange.com/questions/13314/is-r2-useful-or-dangerous/13317
>  https://doi.org/10.1037/met0000126
> [[alternative HTML version deleted]]
> R-sig-mixed-models using r-project.org mailing list
More information about the R-sig-mixed-models