[R-meta] multi-level meta-analysis with dependent effect sizes
James Pustejovsky
jepu@to @ending from gm@il@com
Tue Sep 11 16:23:20 CEST 2018
Andrew,
In your study, how does the hierarchical dependence arise? Is it that some
studies report results from multiple (presumably independent) samples?
If so, then I would argue that the best thing to do might be one of the
following:
1. Go full Gleser-Olkin. If multiple-treatments dependence is the *only*
type of sampling dependence in your data, then it would be better just to
work through the calculation of the covariances. This would let you then
treat the sampling variance-covariance matrix as known, and you could avoid
the need for cluster-robust standard errors all together (unless there are
other types of mis-specification to worry about, in which case clustered
SEs could still be useful).
2. Since you have a common outcome scale, you could also consider using the
mean outcome as the ES. If different treatment groups within a study are
independent (i.e., a between-subjects design), then modeling the means
directly would let you avoid dealing with the Gleser-Olkin business.
Generally, which of these (or the other approaches you described) is most
defensible will depend on the goals of your analysis and, in particular,
whether the moderators of interest involve between-study variation,
within-study variation, or both. For moderators that vary between-study, I
have found (based on personal experience) that the approach to
modeling/handling dependence tends not to matter much. But for moderators
where there is within-study variation, it is more salient.
Does sampletype vary within study? And what about the other factors?
James
On Tue, Sep 11, 2018 at 8:30 AM Andrew Guerin <Andrew.Guerin using newcastle.ac.uk>
wrote:
> I am new to meta-analysis, and have benefited greatly from the excellent
> documentation and resources for meta-analysis in R, especially the websites
> for metafor and clubSandwich. I am planning some extensive future work
> using meta-analysis, so thought it would be useful to get some experience
> of the whole process by starting with something relatively simple (haha)
> and trying to work it up for a publication. I would like some advice on
> whether I am following the correct approach.
>
>
> I selected a topic with which I am quite familiar and which has a
> relatively limited literature: I did some thorough literature searching and
> came up with ~70 relevant studies, which I will use for a few models
> looking at different aspects (I'll focus on one here). Once I had coded all
> the data it became clear that the analysis was not going to be as simple as
> I had hoped, since the data have two kinds of dependence that should be
> accounted for:
>
>
> 1. Hierarchical dependence - a lot of the studies contribute multiple
> effect sizes.
>
> 2. Many of the effect sizes share controls - different treatments are
> compared against a common 'untreated' control - Gleser and Olkin's (2009)
> 'multiple treatments' dependence.
>
>
> For this analysis, there are a total of 302 effect sizes, grouped into 244
> 'contrast' groups which share a control, from 43 studies. The 244 contrast
> groups contain 1-8 effect sizes.
>
>
> There are various moderators. The most important is 'sampletype' -
> expected to be significant, as it is expected that treatment will have
> different effects on different sample types. The other factors all relate
> to variations in the experimental treatments used, to keep things general
> I'll just call them "A" to "H".
>
>
> ndat = data frame
>
> md = effect size measure (raw mean difference, since all data are on a
> common scale, with a limited range)
>
> mdv = sampling variation for mean difference, obtained using escalc().
>
> sid = study id (a simple numerical identifier)
>
> contrast = identifier showing which effect size measures / treatments
> share a control (eg. study one contributes 5 effect sizes, with values of
> 'contrast' 1A, 1B, 1B, 1C, 1D, indicating that of the 5 effect sizes in
> study 1, 2 share a control).
>
>
> As I see it I have the following options for how to proceed:
>
>
> A) conduct a completely 'cluster naive' mixed effects meta-analysis
>
>
> nmtA <- rma(md, mdv, mods = ~ factor(sampletype) + factor(A) ... +
> factor(H), data=ndat, method="REML")
>
> anova(nmtA, btt=20:21) ## example hypothesis test for moderator 'D'
>
>
> B) ignore the hierarchical dependence (ie. assume that any samples with
> different values of 'contrast' within the same study are independent), but
> account for the correlated errors within clusters / contrast groups by RVE
> in robumeta, using clubSandwich for the hypothesis tests.
>
>
> nmtB <- robu(md ~ factor(sampletype) + factor(A) ... + factor(H),
> data=ndat, var.eff.size=mdv, rho = 0.8, modelweights="CORR",
> studynum=contrast)
>
> Wald_test(nmtB, constraints = 20:21, vcov="CR2") ## example hypothesis
> test for moderator 'D'
>
>
>
> C) still ignoring the hierarchical dependence, but now explicitly
> specifying the random effects in rma.mv, after imputing a
> variance-covariance matrix. Robust parameter estimates and hypothesis test
> are then carried out using clubSandwich.
>
>
> vcvndat <- impute_covariance_matrix(vi = ndat$mdv, cluster=ndat$contrast,
> r=0.7)
>
> nmtC <- rma.mv(yi=md, V=vcvndat, mods = ~ factor(sampletype) + factor(A)
> ... + factor(H), random = ~1|contrast, data=ndat, method="REML")
>
> nmtC_robust <- coef_test(nmtC, vcov="CR2")
>
> Wald_test(nmtC, constraints = 20:21, vcov="CR2") ## example hypothesis
> test for moderator 'D'
>
>
> D) attempting to account for the hierarchical and multiple-treatments
> dependence via a multi-level model, setting random effect structure as
> random = ~ 1| sid/contrast, and then using clubSandwich for RVE and
> hypothesis tests.
>
>
> vcvndat <- impute_covariance_matrix(vi = ndat$mdv, cluster=ndat$contrast,
> r=0.7)
>
> nmtD <- rma.mv(yi=md, V=vcvndat, mods = ~ factor(sampletype) + factor(A)
> ... + factor(H), random = ~1|sid/contrast, data=ndat, method="REML")
>
> nmtD_robust <- coef_test(nmtD, vcov="CR2", cluster=ndat$contrast)
>
> Wald_test(nmtD, constraints = 20:21, vcov="CR2", cluster=ndat$contrast) ##
> example hypothesis test for moderator 'D'
>
>
> My main question really is which (if any) of these approaches is the best
> way forward? For this particular dataset, the outcomes (at least in terms
> of which moderators are significant/non-significant, and most parameter
> estimates) are pretty similar for all the options (including some other
> options that I tried that I have not mentioned here). All find 'sampletype'
> highly significant, and find similar sets of the other moderators to be
> non-significant (except for the naive analysis, A which finds more
> moderators to be significant). B, C, and D are all very similar. Although D
> is the only one that explicitly accounts for the hierarchical dependence,
> does the fact that the outcomes from B and C are substantially similar mean
> that this could be safely overlooked for this analysis?
>
>
> I have not yet attempted to build my own vcov matrix using the Gleser and
> Olkin (2009) formula. Is this worth doing instead of using
> impute_covariance_matrix? I picked 0.7 as a plausible value for r in
> impute_convariance_matrix(), but I have raw data from one study that might
> allow me to derive a better estimate. I did investigate the effect of
> varying r from 0-1. It does not seem to have much impact on parameter
> estimates, the significance/non-significance of moderators, or the estimate
> of sigma^2.2 in the multi-level model. The intercept moves about a little,
> but the main effect is that when r is less than about 0.5, sigma^2.1 is
> basically 0. Is this indicating a problem?
>
>
> Finally I was wondering whether it is wise to account for multiple
> comparisons when evaluating the hypothesis tests. Since I have several
> factors, I repeat Wald_test() several times. Models nmtB, nmtC, and nmtD
> all have moderators (aside from 'sampletype') that are significant at p <
> 0.05, but with p values only just under 0.05. Adjustment for multiple
> comparisons (eg. using p.adjust(method="fdr")) shifts the outcomes of all
> these tests to p > 0.05.
>
>
> I would be grateful for any advice on the above.
>
>
> Many thanks,
>
> Andrew
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-meta-analysis mailing list
> R-sig-meta-analysis using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>
[[alternative HTML version deleted]]
More information about the R-sig-meta-analysis
mailing list