[R-meta] Multilevel model between-study variance
Frederik Zirn
|reder|k@z|rn @end|ng |rom un|-kon@t@nz@de
Mon Jul 15 09:19:21 CEST 2024
Hi James,
I hope it is OK if I ask a further question. You wondered whether some of the reported effect sizes in the dissertation are different from the others. As explained, the dissertation measures several behaviors at the workplace - all indicating transfer success. Each of those behaviors is 1) self-reported vs a control group 2) third-party evaluated vs a control group. On top, each is measured in a way like 3) "Compared to 3 months ago, I improved my behavior regarding..." (self-reported vs. CG) and 4) "Compared to 3 months ago, my supervisor changed his behavior regarding.. (third-party evaluation vs. CG).
Thus, every identified transfer behavior is measured in 4 different ways. I thought about only including the first two measurements of every behavior and neglecting the retrospective change measurements (those resulted in high effect sizes, especially for the self-reported ones). However, I am unsure whether that would be justified as there is a control group for those measurements, which meets my inclusion criteria.
Considering for a moment that it would be justified: Leaving 3) and 4) out would reduce the included effect sizes from the dissertation from 25 to 13, resulting in a change in the between-study heterogeneity (no longer being zero)
estim sqrt nlvls fixed factor
sigma^2.1 0.0133 0.1151 12 no Study
sigma^2.2 0.0661 0.2570 29 no Study/ES_ID
What is unclear to me: Why would the overall effect of the CHE model decrease marginally from 0.1995 to 01919 when I exclude the estimates with the largest effect sizes from the dissertation? Would that be once more due to those weird properties of the inverse-variance method?
Even with the proposed reduction, the effect size distribution remains very uneven (with more than a third coming from one study), which is why I am unsure whether it is appropriate to implement a CHE model. Do you think this is a valid argument to decide against the CHE model in my case and opt for a reductionist (aggregated) approach instead? Would it be a good idea to communicate transparently the decision against the CHE model in the article?
All the best
Frederik
Am Donnerstag, Juli 11, 2024 13:38 CEST, schrieb "Frederik Zirn" <frederik.zirn using uni-konstanz.de>:
> Hi James, Hi Michael,
>
> Once more, a very nice explanation!
>
> Correct. To be more precise, I have one study with 25 effects, two with 3 effects each, and one with 2 effects - the rest all contribute one effect size.
> To give more context, I am conducting a meta-analysis measuring the transfer success (positive change of workplace behavior) after a special type of soft-skill training intervention. Changes in workplace behavior (related to this training topic) can be measured in different ways. The dissertation identified several behaviors that, according to them, indicate transfer success and measured those after the training. I tend towards aggregating effect sizes for better comparability. In my opinion, the CHE model may lead to confusion due to the uneven distribution of effect sizes. Maybe I'll report the results of both models (aggregated version and then the CHE version as a kind of sensitivity analysis).
>
> Wow, that is interesting and good to know! Thanks for clearly explaining why the overall effect can be negative when aggregating.
>
> Also, thank you very much, Michael, for your input and suggestion!
>
> All the best
> Frederik
>
>
> Am Mittwoch, Juli 10, 2024 21:08 CEST, schrieb James Pustejovsky <jepusto using gmail.com>:
>
> > Hi Frederik,
> >
> > These are very interesting questions (though also rather vexing!). From
> > what you've described, it sounds like you have 11 studies that contribute
> > just 1 or 2 effect size estimates each (for a total of 16 effects) plus one
> > dissertation with 25 effects.
> >
> > To figure out how to approach modeling this, I think the first and probably
> > most important consideration is about the comparability or similarity of
> > the effects reported in the big dissertation versus the effects reported in
> > the other studies. Is the reason that the dissertation has so many more
> > effects that it reported effects on every sub-scale of three different
> > scales, while all of the other studies only reported effects for the full
> > scales? Or perhaps the dissertation reported effects for several interim
> > assessments plus the post-intervention follow-up assessment, whereas all
> > the other studies only reported effects at the post-intervention follow-up?
> > If there is a big contrast between the type of data included in the
> > dissertation compared to the type of data reported in the other studies,
> > then you might consider _excluding_ some or even many of the effects from
> > the dissertation, to better align the data from the dissertation with the
> > data reported in the other studies. Alternatively, it might make sense to
> > aggregate some or all of the dissertation effect sizes in order to arrive
> > at a composite that is better aligned with the data from the other studies.
> >
> > To your question about taking the simple average versus a weighted average,
> > I think it depends on whether the effects being aggregated really have
> > meaningfully different sampling variances. For example, if the dissertation
> > reports effects for each of several sub-groups of participants, and the
> > subgroups have very different size, then taking the weighted average will
> > give a better approximation to the overall average effect than taking the
> > simple average. On the other hand, maybe the differences in the sampling
> > variances are mostly just noise, so it would be better to not use those
> > differences in variance to get an optimally weighted (inverse-variance)
> > average. Or, as a further possibility, you could take the average of the
> > sampling variances and use this for all of the effects. This smooths over
> > the noise in the sampling variances and also makes the simple average and
> > the weighted average equivalent.
> >
> > I would guess that the issue you described with getting a negative average
> > effect is probably happening because of differences in the magnitude of the
> > sampling variances. A weird property of inverse-variance weighting in this
> > context is that they can assign negative weight to some effects, especially
> > if those effects have much larger variance than the other effects included
> > in the aggregate. Smoothing out the sampling variances would make this
> > problem go away.
> >
> > James
> >
> Am Mittwoch, Juli 10, 2024 19:16 CEST, schrieb Michael Dewey <lists using dewey.myzen.co.uk>:
>
> > Dear Frederik
> >
> > I would have thought there is a good case here for reporting both your
> > model 1 and model 3. If they lead to very different results you may have
> > some head scratching to do but if they are not too far apart then
> > perhaps pick model 1 and report model 3 as a sensitivity analysis.
> >
> > Michael
>
> > On Wed, Jul 10, 2024 at 10:45 AM Frederik Zirn <
> > frederik.zirn using uni-konstanz.de> wrote:
> >
> > > Hi James,
> > >
> > > Thank you very much for your clear and helpful answer. I learned a lot!
> > >
> > > I tried changing the value of rho, however, even if I specify rho to be 0,
> > > my between-study variance remains 0.
> > >
> > > Regarding the equivalence of the aggregated approach and the "correlated
> > > effects" model: at first I was puzzled, because I did not get the same
> > > results for the two versions. After further inspection I found out why: the
> > > two are equivalent if one computes weighted averages per study. I used the
> > > package MAd, function "agg" with Borenstein et al. (2021) formulas, which
> > > uses unweighted averages. Using weighted averages as you did (with the
> > > aggregate function), I do in fact get the same results.
> > >
> > > The last point you made, I guess, hit the nail on the head and explains
> > > why there is no between study variance in my multivariate model. 25 (!!) of
> > > the 41 effect sizes stem from one dissertation alone. Leaving that
> > > dissertation out leads to the following variance components:
> > > estim sqrt nlvls fixed factor
> > > sigma^2.1 0.0458 0.2140 11 no Study
> > > sigma^2.2 0.0017 0.0407 16 no Study/ES_ID
> > >
> > > Now I am wondering what to do with that dissertation. As I see it, there
> > > are three options:
> > >
> > > 1) Relying on the multilevel model and INCLUDING the dissertation. This
> > > doesn't feel quite right as the results of my analyses (on top of examining
> > > the overall effect, I'd like to report results of 3 moderator analyses
> > > which I pre-specified prior to my search - I know that I have to be careful
> > > with interpretation due to low sample size) are probably influenced by the
> > > fact that there is no between-study variance detected by the model? The
> > > reason for reporting no between study variance in my article would then
> > > simply be the inclusion of that dissertation. The CHE model would be pretty
> > > tenuous because of the distribution of effect sizes.
> > >
> > > 2) Using within-study averaging (which, according to your article is
> > > equivalent to a correlated effects model) and INCLUDING the dissertation.
> > > If I go for that option: would you consider weighted averaging per study
> > > superior to unweighted averaging as proposed by Borenstein et al (2021)?
> > > There is one thing I do not get when using the weighted approach (test_agg
> > > <- aggregate(all_study_designs_combined, cluster = Study, rho = 0.59). The
> > > pooled effect size for the dissertation is negative then (-0.22 compared to
> > > being 0.5 when using the unweighted average). That is kind of strange as
> > > only 2 effect sizes out of 25 are negative. Those 2 have a relatively small
> > > variance, however, there are a many many other positive effect sizes having
> > > approximately the same variance. It just doesn't add up for me why the
> > > overall estimate should be negative. Thus, I calculated the average effect
> > > size by hand with inverse variance weighting and got an aggregated effect
> > > size of 0.28. In order to find the reason for the discrepancy with the
> > > outcome in R I changed the rho value in R to 0 (test_agg <-
> > > aggregate(all_study_designs_combined, cluster = Study, rho = 0). Result:
> > > the effect size in R is now also 0.28. Why would a rho-value of 0.59 make
> > > the overall effect negative (and higher rho values making it even more
> > > negative)?
> > >
> > > 3) Relying on the multilevel model WITHOUT the dissertation (even though
> > > it met my inclusion criteria). I guess I would need good justification for
> > > that.
> > >
> > > Thank you very much for your valuable input!
> > >
> > > All the best
> > > Frederik
> > >
> > >
> > > Am Mittwoch, Juli 10, 2024 04:27 CEST, schrieb James Pustejovsky <
> > > jepusto using gmail.com>:
> > >
> > > > Hi Frederik,
> > > >
> > > > Your interpretation of the parameter estimates is correct, but the
> > > > estimator of between-study heterogeneity may be quite imprecise if you
> > > have
> > > > only 12 studies. You can get a sense of this by computing profile
> > > > likelihood confidence intervals for the variance components:
> > > > confint(che.model)
> > > > Note that these confidence intervals are based on the assumptions of the
> > > > working model, so they are not robust to mis-specification in the same
> > > > sense that robust CIs for the average effect size are robust.
> > > >
> > > > Both the point estimate of between-study heterogeneity and its confidence
> > > > interval could be pretty sensitive to the assumed value of rho (the
> > > > correlation of the sampling errors). Using a smaller value of rho will
> > > tend
> > > > to produce larger estimates of between-study heterogeneity and somewhat
> > > > smaller estimates of within-study heterogeneity. So it's definitely
> > > helpful
> > > > to use empirical data to inform this assumption (as you've done) and to
> > > > conduct sensitivity analyses (as you've also noted).
> > > >
> > > > To your broader question about whether aggregating is state-of-the-art, I
> > > > have a new paper (with Man Chen) about exactly this question:
> > > >
> > > https://jepusto.com/publications/Equivalences-between-ad-hoc-strategies-and-models/
> > > > We argue that aggregating is neither correct nor incorrect, but is rather
> > > > just a different working model that might or might not be reasonable for
> > > > your data. Specifically, we show that aggregating is exactly equivalent
> > > to
> > > > using a multivariate working model that has a between-study variance
> > > > component but no within-study variance component (which we call the
> > > > "correlated effects" working model), as in the following:
> > > > ce.model <- rma.mv(yi = yi,
> > > > V = V,
> > > > random = ~ 1 | Study,
> > > > data = all_study_designs_combined)
> > > > This way of representing the model is helpful because it puts the CE
> > > model
> > > > on equal footing with the CHE model and allows them to be directly
> > > compared
> > > > via likelihood ratio tests:
> > > > anova(che.model, ce.model)
> > > > Or otherwise compared and discussed as alternatives.
> > > >
> > > > One other note: the sensitivity of the variance component estimates is
> > > also
> > > > affected by the distribution of effect sizes per study. It is therefore
> > > > useful to examine and report the range of effect estimates per study. If
> > > 10
> > > > of your studies each have a single effect and 2 each have many effect
> > > > sizes, then the CHE will be much more tenuous than if all twelve studies
> > > > have 3 or 4 effect sizes each.
> > > >
> > > > James
> > > >
> > > > On Mon, Jul 8, 2024 at 5:08 AM Frederik Zirn via R-sig-meta-analysis <
> > > > r-sig-meta-analysis using r-project.org> wrote:
> > > >
> > > > > Dear R-sig-meta-analysis community,
> > > > >
> > > > > I am a PhD student conducting a meta-analysis of 12 studies with 41
> > > effect
> > > > > sizes. There is dependency among my effect sizes as several studies
> > > measure
> > > > > the same outcome at multiple time points or multiple outcomes are
> > > measured
> > > > > in the same study.
> > > > >
> > > > > My first approach was to aggregate effect sizes per study using the
> > > > > agg-function of the Mad package.
> > > > > all_study_designs_combined_agg <- agg(id = Study, es = yi, var = vi,
> > > > > method = "BHHR", cor = 0.59, data=all_study_designs_combined)
> > > > >
> > > > > cor = 0.59 is based on the correlations reported within a study of my
> > > > > meta-analysis. I am conducting sensitivity analyses with other values.
> > > > >
> > > > > 1) However, aggregating within studies is no longer considered
> > > > > state-of-the-art practice, correct? Or is this still a valid approach
> > > to
> > > > > handle dependent effect sizes?
> > > > >
> > > > > Consequently, I aimed to create a multilevel model. Here is the code I
> > > > > used:
> > > > >
> > > > > Fitting a CHE Model with Robust Variance Estimation
> > > > > constant sampling correlation assumption rho <- 0.59
> > > > >
> > > > > constant sampling correlation working model
> > > > > V <- with(all_study_designs_combined,
> > > > > impute_covariance_matrix(vi = vi,
> > > > > cluster = Study,
> > > > > r = rho))
> > > > >
> > > > > che.model <- rma.mv(yi = yi,
> > > > > V = V,
> > > > > random = ~ 1 | Study/ES_ID,
> > > > > data = all_study_designs_combined)
> > > > > che.model
> > > > >
> > > > > robust variance
> > > > > full.model.robust <- robust(che.model,
> > > > > cluster=all_study_designs_combined$Study, clubSandwich = TRUE)
> > > > > summary(full.model.robust)
> > > > >
> > > > > Doing so, I receive the following variance components:
> > > > > sigma^2.1 0.0000 0.0000 12 no Study
> > > > > sigma^2.2 0.1286 0.3587 41 no Study/ES_ID
> > > > >
> > > > > 2) I have trouble interpreting those findings. Does that indicate that
> > > all
> > > > > of the variance in my model comes from within-study variance, and I
> > > have no
> > > > > between-study variance? This does not seem plausible to me. Am I
> > > overseeing
> > > > > something here? Could that be due to the limited sample size (12
> > > studies)?
> > > > >
> > > > > Thanks in advance,
> > > > > Frederik Zirn
> > > > > PhD student
> > > > > Chair of Corporate Education
> > > > > University of Konstanz
> > > > >
> > > > > _______________________________________________
> > > > > R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
> > > > > To manage your subscription to this mailing list, go to:
> > > > > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
> > > > >
> > >
> > >
More information about the R-sig-meta-analysis
mailing list