[R-meta] Chi-square or F-test to test for subgroup heterogeneity

Thu Oct 11 11:08:52 CEST 2018

James has already given some further advice, but I just want to add a few more things:

The issue we are discussing here is not unique to meta-analysis, but applies to mixed-effects models in general. Except for some special cases, test statistics (whether they are for single group contrasts, tests of slopes, or tests of multiple coefficients, such as factors or omnibus tests) do not have known/exact distributions. For that reason, some R packages for mixed-effects models even opted not to compute p-values corresponding to the test statistics altogether (since that requires an assumption about the distribution of the test statistics).

Asymptotically, the test statistics will be standard normal (or chi^2 when testing multiple coefficients) under the null. Therefore, a lot of software for mixed-effects model defaults to using these distributions to compute p-values. A problem with this is: When is the sample size large enough (and *which* sample size? the number of clusters? the number of observations?) so that the use of normal / chi^2 distributions is justified? Impossible to say in general.

But I would not say that there is no justifiable approach for testing for overall between-group heterogeneity. The use of normal / chi^2 distributions is ok in many cases. Actually, most meta-analyses involve so much fudging in putting together the data to begin with that some of the debates around the statistical methods could be considered purely academic -- but hey, statisticians have to publish papers too, so we gladly continue to debate whether DL, REML, or other heterogeneity estimators are better or worse by running yet even more simulation studies ... (I am guilty of that, too).

Finally, if the significance of a finding depends on whether we use z, t (with dfs=k-p or some other dfs) with or without some further adjustments to the standard errors, or some other method, then I would say that we should be rather cautious about interpreting this finding to begin with.

Best,
Wolfgang

-----Original Message-----
From: Ty Beal [mailto:tbeal using gainhealth.org] 
Sent: Tuesday, 09 October, 2018 19:37
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis using r-project.org
Subject: Re: Chi-square or F-test to test for subgroup heterogeneity

Hi Wolfgang,

Thank you for your message. I wish there was a practical way to conduct a well-justified and accepted test for overall heterogeneity between groups.

The Cochrane handbook also warns of using statistical significance testing for differences between groups. But there is little guidance for meta-analyses of prevalence/means at the population-level, which I think may have a different justification for conducting subgroup analysis, since they are not measuring treatment effects.

In absence of a justifiable approach for testing for overall between-group heterogeneity, I suppose I could go straight to pairwise testing while adjusting for multiplicity of testing (using Knapp and Hartung, Holm, or Bonferroni).

I also wonder if calculating I2 for between-group heterogeneity would be of use, but of course it would be based on the Q-statistic (Z- or chi-square tests), which are problematic for this purpose, as you pointed out.

I welcome any other suggestions for approaching between-group heterogeneity in the context of a meta-analysis of prevalence/means of dietary intake.

Best,
Ty

On 10/9/18, 11:12 AM, "Viechtbauer, Wolfgang (SP)" <wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:

Hi Ty,

If we want to be picky, neither test="z" nor test="t" in rma.mv() is really justifiable. Using z- and chi-square tests ignores the uncertainty in the estimated variance components and can lead to inflated Type I error rates (but also overly conservative rates when there is very little or no heterogeneity).

Using test="t" naively uses t- and F-tests with degrees of freedom equal to p and k-p dfs (where k is the total number of estimates and p the total number of model coefficients), but this is really an ad-hoc method -- that may indeed provide somewhat better control of the Type I error rates (at least when there is inflated to begin with), but again, the use of t- and F-distributions isn't properly motivated and the computation of the dfs is overly simplistic.

The Knapp & Hartung method that is available for rma.uni() with test="knha" not only uses t- and F-tests, but also adjusts the standard errors in such a way that one actually gets t- and F-distributions under the null (technically, there is some fudging also involved in the K&H method, but numerous simulation studies have shown that this appears to be a non-issue).

Unfortunately, test="knha" is not (currently) available for rma.mv(). A generalization of the K&H method to 'rma.mv' models is possible, but I have not implemented this so far, because further research is needed to determine if this is really useful.

Another route would be to use t- and F-distribution, but then a Satterthwaite approximation to the dfs. I have examined this for rma.uni() models, but this appears to be overly conservative, especially under low heterogeneity. For moderate to large heterogeneity, this does appear to work though. Further research is also needed here to determine how well this would work for 'rma.mv' models. Also, working out how to implement this in general for 'rma.mv' models isn't trivial. The same applies to the method by Kenward and Roger.

Maybe James (Pustejovsky) can also chime in here, since, together with Elizabeth Tipton, he has done some work on this topic when using cluster-robust inference methods.

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Ty Beal
Sent: Friday, 05 October, 2018 21:03
To: r-sig-meta-analysis using r-project.org
Subject: [R-meta] Chi-square or F-test to test for subgroup heterogeneity

Hi all,

I estimated mean frequency of consumption as well as prevalence of less-than-daily fruit and vegetable consumption, at-least-daily carbonated beverage consumption, and at-least-weekly fast food consumption among school-going adolescents aged primarily 12-17 years from Africa, Asia, Oceania, and Latin America between 2008 and 2015. Random-effects meta-analysis was used to pool estimates globally and by WHO region, World Bank income group, and food system typology.

To keep things simple, I will just ask about region. There are 5 regions included in the analysis. I would like to first test whether there is significant heterogeneity between all regions (omnibus test), and if so then do pairwise tests between specific regions. I am using rma.mv() with mods as the 5 regions and want to know whether I should use the default “z” statistic, which for the omnibus test is based on a chi-square distribution or “t”, which for the omnibus test is based on the F-distribution.

Best,

Ty Beal, PhD
Technical Specialist
Knowledge Leadership

GAIN – Global Alliance for Improved Nutrition
1509 16th Street NW, 7th Floor | Washington, DC 20036
tbeal using gainhealth.org<mailto:atumilowicz using gainhealth.org>
C: +1 (602) 481-5211
Skype: tyroniousbeal
[GAINbanner]<http://www.gainhealth.org>