[R-meta] Chi-square or F-test to test for subgroup heterogeneity

Viechtbauer, Wolfgang (SP) wolfg@ng@viechtb@uer @ending from m@@@trichtuniver@ity@nl
Thu Oct 11 20:45:04 CEST 2018

Hi Pier-Alex,

Indeed, 1 and 2 are due to 0 events. You set add=0, so the usual +1/2 correction is therefore not applied to the observed number of events when computing the study-specific estimates (i.e., the observed log odds of the individual studies). The observed outcomes of the individual studies are only needed when doing things like constructing forest plots, not for fitting the model itself (see help(rma.glmm) and the section "Observed Outcomes of the Individual Studies"). So, you could also leave 'add' unchanged - then you won't get these warnings. Or you can indeed just ignore those warnings.

As for 3, this is a bit more tricky. No, you are not fitting a saturated model. But inside of rma.glmm(), besides the model you are fitting, a saturated model is also fitted. This is needed to conduct the heterogeneity test. Or rather, two heterogeneity tests, one being the Wald-type test and the other being a likelihood ratio test (LRT). In 'standard' random/mixed-effects models (those fitted with rma()), these two tests coincide, but no longer in rma.glmm() models.

For the Wald-type test, the Hessian of the saturated model must be inverted. This is numerically tricky and can sometimes fail. That appears to be the case here.

I do not know which version of metafor you are using, but if not the 'devel' version, then try installing that (https://wviechtb.github.io/metafor/#installation) and rerunning the analysis. It might not make a difference, but there have been minor improvements that could make the computation of the Wald-type heterogeneity test work for your data. Or you could just go with the LRT.


-----Original Message-----
From: Pier-Alexandre Tardif [mailto:pier-alexandre.tardif.1 using ulaval.ca] 
Sent: Thursday, 11 October, 2018 16:35
To: Ty Beal; Viechtbauer, Wolfgang (SP); r-sig-meta-analysis using r-project.org
Subject: RE: Chi-square or F-test to test for subgroup heterogeneity

I do not mean to hijack this discussion, I'm also struggling with a similar issue and thought it best to use this topic related to heterogeneity and degrees of freedom. Following this (http://www.metafor-project.org/doku.php/analyses:stijnen2010), I arrive to the same reported results, however I obtain the following warning messages:

Code used:
res <- rma.glmm(measure="PLO", xi=ai, ni=n1i, data=dat, add=0, to="none", drop00=FALSE, level=95, digits=4, method="ML")
print(res, digits=4)
predict(res, transf=transf.ilogit, digits=4)

Random-Effects Model (k = 18; tau^2 estimator: ML)
tau^2 (estimated amount of total heterogeneity): 0.8265
tau (square root of estimated tau^2 value):      0.9091
I^2 (total heterogeneity / total variability):   63.14%
H^2 (total variability / sampling variability):  2.71

Tests for Heterogeneity: 
Wld(df = 17) =      NA, p-val = NA
LRT(df = 17) = 37.9897, p-val = 0.0025

Model Results:
estimate      se      zval    pval    ci.lb    ci.ub     
 -4.8121  0.3555  -13.5373  <.0001  -5.5089  -4.1154  ***

Warning messages:
1: In escalc.default(measure = measure, xi = xi, mi = mi, add = add,  :
  Some 'yi' and/or 'vi' values equal to +-Inf. Recoded to NAs.
2: In rma.glmm(measure = "PLO", xi = ai, ni = n1i, data = dat, add = 0,  :
  Some yi/vi values are NA.
3: In rma.glmm(measure = "PLO", xi = ai, ni = n1i, data = dat, add = 0,  :
  Cannot invert Hessian for saturated model.

I guess 1: and 2: can be explained by the presence of «zero events» so we can simply ignore the warnings? 

I am not sure to understand the third warning. There are 18 studies, but the warning suggest the model is saturated; 18 degrees of freedom would then be used, but I'm not conducting a meta-regression with additional coefficients or evaluating subgroups so why is that? 

Moreover, here (http://www.metafor-project.org/doku.php/updates), it says that the GLMM function continues to work even when the model is saturated, but also that the tests for heterogeneity are then not available. I guess that explains why the Wald test is «NA»? Should I also consequently avoid interpreting the value given by the likelihood ratio test for the presence of residual heterogeneity (output says this test had 17 DF...)? 

One step forward: correct me if my understanding is wrong, but the saturated model simply 'fit' the observed data without having an extra degree of freedom to take into account the error term, completely ignoring variance and renders the interpretation (notably the generalisability) especially delicate; how can we then be confident in the estimate (the summary proportion)?



-----Message d'origine-----
De : R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] De la part de Ty Beal
Envoyé : 11 octobre 2018 05:43
À : Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl>; r-sig-meta-analysis using r-project.org
Objet : Re: [R-meta] Chi-square or F-test to test for subgroup heterogeneity

This discussion really helps me think through how to best address heterogeneity. In my current study, I would prefer to be conservative with heterogeneity tests, so that any findings are likely to have practical meaning. I will have much more confidence in findings that are highly significant, regardless of which test I use.

On 10/11/18, 5:08 AM, "Viechtbauer, Wolfgang (SP)" <wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:

    James has already given some further advice, but I just want to add a few more things:

    The issue we are discussing here is not unique to meta-analysis, but applies to mixed-effects models in general. Except for some special cases, test statistics (whether they are for single group contrasts, tests of slopes, or tests of multiple coefficients, such as factors or omnibus tests) do not have known/exact distributions. For that reason, some R packages for mixed-effects models even opted not to compute p-values corresponding to the test statistics altogether (since that requires an assumption about the distribution of the test statistics).

    Asymptotically, the test statistics will be standard normal (or chi^2 when testing multiple coefficients) under the null. Therefore, a lot of software for mixed-effects model defaults to using these distributions to compute p-values. A problem with this is: When is the sample size large enough (and *which* sample size? the number of clusters? the number of observations?) so that the use of normal / chi^2 distributions is justified? Impossible to say in general.

    But I would not say that there is no justifiable approach for testing for overall between-group heterogeneity. The use of normal / chi^2 distributions is ok in many cases. Actually, most meta-analyses involve so much fudging in putting together the data to begin with that some of the debates around the statistical methods could be considered purely academic -- but hey, statisticians have to publish papers too, so we gladly continue to debate whether DL, REML, or other heterogeneity estimators are better or worse by running yet even more simulation studies ... (I am guilty of that, too).

    Finally, if the significance of a finding depends on whether we use z, t (with dfs=k-p or some other dfs) with or without some further adjustments to the standard errors, or some other method, then I would say that we should be rather cautious about interpreting this finding to begin with.


    -----Original Message-----
    From: Ty Beal [mailto:tbeal using gainhealth.org]
    Sent: Tuesday, 09 October, 2018 19:37
    To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis using r-project.org
    Subject: Re: Chi-square or F-test to test for subgroup heterogeneity

    Hi Wolfgang,

    Thank you for your message. I wish there was a practical way to conduct a well-justified and accepted test for overall heterogeneity between groups.

    The Cochrane handbook also warns of using statistical significance testing for differences between groups. But there is little guidance for meta-analyses of prevalence/means at the population-level, which I think may have a different justification for conducting subgroup analysis, since they are not measuring treatment effects.

    In absence of a justifiable approach for testing for overall between-group heterogeneity, I suppose I could go straight to pairwise testing while adjusting for multiplicity of testing (using Knapp and Hartung, Holm, or Bonferroni).

    I also wonder if calculating I2 for between-group heterogeneity would be of use, but of course it would be based on the Q-statistic (Z- or chi-square tests), which are problematic for this purpose, as you pointed out.

    I welcome any other suggestions for approaching between-group heterogeneity in the context of a meta-analysis of prevalence/means of dietary intake.


    On 10/9/18, 11:12 AM, "Viechtbauer, Wolfgang (SP)" <wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:

    Hi Ty,

    If we want to be picky, neither test="z" nor test="t" in rma.mv() is really justifiable. Using z- and chi-square tests ignores the uncertainty in the estimated variance components and can lead to inflated Type I error rates (but also overly conservative rates when there is very little or no heterogeneity).

    Using test="t" naively uses t- and F-tests with degrees of freedom equal to p and k-p dfs (where k is the total number of estimates and p the total number of model coefficients), but this is really an ad-hoc method -- that may indeed provide somewhat better control of the Type I error rates (at least when there is inflated to begin with), but again, the use of t- and F-distributions isn't properly motivated and the computation of the dfs is overly simplistic.

    The Knapp & Hartung method that is available for rma.uni() with test="knha" not only uses t- and F-tests, but also adjusts the standard errors in such a way that one actually gets t- and F-distributions under the null (technically, there is some fudging also involved in the K&H method, but numerous simulation studies have shown that this appears to be a non-issue).

    Unfortunately, test="knha" is not (currently) available for rma.mv(). A generalization of the K&H method to 'rma.mv' models is possible, but I have not implemented this so far, because further research is needed to determine if this is really useful.

    Another route would be to use t- and F-distribution, but then a Satterthwaite approximation to the dfs. I have examined this for rma.uni() models, but this appears to be overly conservative, especially under low heterogeneity. For moderate to large heterogeneity, this does appear to work though. Further research is also needed here to determine how well this would work for 'rma.mv' models. Also, working out how to implement this in general for 'rma.mv' models isn't trivial. The same applies to the method by Kenward and Roger.

    Maybe James (Pustejovsky) can also chime in here, since, together with Elizabeth Tipton, he has done some work on this topic when using cluster-robust inference methods.


    -----Original Message-----
    From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Ty Beal
    Sent: Friday, 05 October, 2018 21:03
    To: r-sig-meta-analysis using r-project.org
    Subject: [R-meta] Chi-square or F-test to test for subgroup heterogeneity

    Hi all,

    I estimated mean frequency of consumption as well as prevalence of less-than-daily fruit and vegetable consumption, at-least-daily carbonated beverage consumption, and at-least-weekly fast food consumption among school-going adolescents aged primarily 12-17 years from Africa, Asia, Oceania, and Latin America between 2008 and 2015. Random-effects meta-analysis was used to pool estimates globally and by WHO region, World Bank income group, and food system typology.

    To keep things simple, I will just ask about region. There are 5 regions included in the analysis. I would like to first test whether there is significant heterogeneity between all regions (omnibus test), and if so then do pairwise tests between specific regions. I am using rma.mv() with mods as the 5 regions and want to know whether I should use the default “z” statistic, which for the omnibus test is based on a chi-square distribution or “t”, which for the omnibus test is based on the F-distribution.


    Ty Beal, PhD
    Technical Specialist
    Knowledge Leadership

    GAIN – Global Alliance for Improved Nutrition
    1509 16th Street NW, 7th Floor | Washington, DC 20036
    tbeal using gainhealth.org<mailto:atumilowicz using gainhealth.org>
    C: +1 (602) 481-5211
    Skype: tyroniousbeal

More information about the R-sig-meta-analysis mailing list