[R-meta] Dealing with effect size dependance with a small number of studies

Tue Jan 5 13:27:41 CET 2021

Dear Wolfgang,

message received (both times) :)

I'll respond inline as well.

On Tue, Jan 5, 2021 at 12:16 PM Viechtbauer, Wolfgang (SP)
<wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>
> Ok, I think I am replying to the right post now ...
>
> Responses again below.
>
> Best,
> Wolfgang
>
> >-----Original Message-----
> >From: Danka Puric [mailto:djaguard using gmail.com]
> >Sent: Tuesday, 05 January, 2021 10:36
> >To: Viechtbauer, Wolfgang (SP)
> >Cc: James Pustejovsky; R meta
> >Subject: Re: [R-meta] Dealing with effect size dependance with a small
> >number of studies
> >
> >Dear James, Wolfgang,
> >
> >Thanks a lot for the quick and informative responses!
> >
> >1. I guess I made things unnecessarily complicated :) The thing is, I know
> >that all these models are essentially the same:
> >
> >notationa <- rma.mv(ES_corrected, SV, random = ~ factor(IDeffect) | IDstudy,
> >data=MA_dat_raw)
> >notationb <- rma.mv(ES_corrected, SV, random = ~ 1 | IDstudy/IDeffect,
> >data=MA_dat_raw)
> >notationc <- rma.mv(ES_corrected, SV, random = ~ IDeffect | IDstudy,
> >data=MA_dat_raw)
> >
> >but I read in the Konstantopoulos (2011) example that they only deal with
> >the dependence arising from effect sizes coming from the same studies, but
> >NOT with dependence arising from multiple ES coming from the same group of
> >participants. I then erroneously concluded that in order to deal with this
> >type of dependence I would need to use struct = "UN", but I understand now
> >that's not the case.
>
> Yes, the 'struct' part is a different issue.
>
> If you directly want to account for dependency due to multiple effect size estimates coming from the same group of subjects, you would need to calculate the covariance between the sampling errors and include this in the 'V' matrix (the second argument in rma.mv(), to which you are passing 'SV'). In addition, we then also want to use a model like the one above to account for possible dependency in the underlying true effects. That is in fact how things are done in the Berkey example (and since 'outcome' is meaningful there, we can use struct="UN" to have an estimate of tau^2 for the two different outcomes).

Thanks for the additional clarification! We will try to create "V"
using the impute_covariance_matrix function James suggested and then
use it instead of SV in our syntax:
model <- rma.mv(ES_corrected, V, random =  ~ 1 | IDstudy / IDsubsample
/ IDeffect, data=MA_dat_raw)

>
> >Also, indeed, IDeffect does not refer to the type of outcome in a study.
> >Actually, we do have an outcome variable DV which could be used instead of
> >IDeffect, but sometimes it has the same value for several ESs in the same
> >group of participants, so it didn't seem appropriate to use it in this case.
> >I did realize the model with IDeffect was not structured like Berkey at al.
> >but thought it would be a better option, as IDeffect variables have unique
> >values across IDstudy.
> >
> >As for sigma1.1 and sigma2.1. it's quite possible I just got something mixed
> >up when I compared different notations (I may have plotted the wrong model),
> >but anyway, this model is definitely wrong for the data, so I'll just leave
> >it at that.
>
> Agreed.
>
> >2. So, just to be sure I got this right, the following model
> >
> >model <- rma.mv(ES_corrected, SV, random =  ~ 1 | IDstudy / IDsubsample/
> >IDeffect, data=MA_dat_raw)
> >
> >in combination with clubSandwich robust estimates will yield adequate effect
> >size estimates for the situation where the same group of participants
> >provided more than one ES? That's actually the model I fit first, but then
> >thought wasn't appropriate after all.
>
> Yes, this looks like a sensible approach. In principle, since you mentioned it above, you could even consider:
>
> random =  ~ 1 | IDstudy / IDsubsample / IDOutcome / IDeffect
>
> where IDOutcome is the id variable for the the different outcomes. This model is in principle possible, since you mentioned that sometimes, within a particular subsample, there are multiple effects for the same outcome (if this were not the case, then IDOutcome and IDeffect would not be uniquely identifiable). However, this may be pushing things a bit with k=69 estimates. In essence, this is a five-level model, so two levels more than the 'three-level model' described by Konstantopoulos (2011):
>
> http://www.metafor-project.org/doku.php/analyses:konstantopoulos2011
>
> (in multilevel model parlance, the standard random-effects model would be considered a two-level model, so the number of levels is 1 + the number of hierarchical levels you are adding via 'random').

Yes, adding an IDoutcome variable would actually be a great way to
approach the issue of same/different outcomes coming from the same
subsample. But I also agree that such a model might be too complex for
the data and I would expect both IDoutcome and IDeffect to have
variances very close to zero. Still, we can give it a try and see if
it makes sense.

>
> >I will also now look into inputting the covariance matrices and see if it's
> >possible to implement with the data we have. Thanks for suggesting this,
> >James.
>
> In this case, covariances in the sampling errors will occur only within subsamples. So, the 'V' matrix will be block-diagonal with blocks corresponding to the subsamples. For example, suppose we have
>
> IDstudy IDsubsample IDOutcome IDeffect
> 1       1           outcomeA  1
> 1       1           outcomeA  2
> 1       1           outcomeB  1
> 1       1           outcomeB  2
> 1       2           outcomeA  1
> 1       2           outcomeA  2
> 1       2           outcomeB  1
> 1       2           outcomeB  2
>
> so a study with two groups and in each group both outcomes (A and B) were measured in two different ways (e.g., using two different scales), leading to two different effects. Then the V matrix for this study would be an 8x8 matrix that is composed of two 4x4 blocks.

Again, thank you for clarifying this. I assumed this was how it
worked, I'm just still not sure how to create V from the data we have,
but I'll get onto that straight away!

Thanks so much!
Danka

>
> >3. Great, it does make the most sense to include all three levels of random
> >effects, I'm glad the small amount of variance is not an issue.
> >
> >4. @James, unfortunately, our moderator is categorical and I'm not sure if
> >it theoretically makes sense to center it in any way... And by "taking care"
> >of this problem, I mostly meant that we're not making a huge mistake for not
> >explicitly modeling this if we use robust estimates. So, I think we'll
> >probably just leave it as it is.
> >
> >Thanks again, this is really immensely helpful.
> >
> >Best,
> >Danka
> >
> >On Mon, Jan 4, 2021 at 11:35 PM Viechtbauer, Wolfgang (SP)
> ><wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> >A few additional comments from my side below as well.
> >
> >Best,
> >Wolfgang
> >
> >>-----Original Message-----
> >>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-
> >project.org]
> >>On Behalf Of James Pustejovsky
> >>Sent: Monday, 04 January, 2021 22:25
> >>To: Danka Puric
> >>Cc: R meta
> >>Subject: Re: [R-meta] Dealing with effect size dependance with a small
> >>number of studies
> >>
> >>Hi Danka,
> >>
> >>Responses inline below.
> >>
> >>Kind Regards,
> >>James
> >>
> >>On Mon, Jan 4, 2021 at 5:41 AM Danka Puric <djaguard using gmail.com> wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> Apologies for the long post and lots of questions.
> >>>
> >>> We are doing a meta-analysis where a single study sometimes included more
> >>> than one subsample and also the same subsample (same group of
> >participants)
> >>> sometimes yielded more than one effect size.
> >>>
> >>> 1. Following the Berkey et al. (1998) example in metafor, we tried
> >fitting
> >>> the following “basic” model:
> >>> nested_UN <-rma.mv(ES_corrected, SV, random = ~ IDeffect | IDstudy,
> >struct
> >>> = "UN", data=MA_dat_raw)
> >>> where individual effect sizes are nested within studies. This model,
> >>> however, produces profile likelihood plots which have flat parts (both
> >for
> >>> sigma2.1 and sigma2.2), which (if I’m not mistaken) indicates model
> >>> overparametrization. We believe this is most likely due to a small number
> >>> of effect sizes (k = 69, from 53 subsamples, from 20 studies).
> >
> >This model should not have any variance components called "sigma2.1" or
> >"sigma2.2". When using the "random = ~ IDeffect | IDstudy" notation, you
> >should get "tau2" and "rho" values.
> >
> >However, this model doesn't make much sense. I assume that different values
> >of "IDeffect" are just used to differentiate multiple effects within the
> >same study, but the levels are not meaningful in themselves (as opposed to
> >the Berkey example, where the two levels of the 'inner' factor differentiate
> >the two different outcomes). It would make more sense to use 'random = ~
> >IDeffect | IDstudy, struct = "CS"' which is in essence the same as 'random =
> >~ 1 | IDstudy / IDeffect'. See:
> >
> >http://www.metafor-project.org/doku.php/analyses:konstantopoulos2011
> >
> >>> We tried a similar model with random = ~ IDeffect | IDsubsample, but this
> >>> model did not even converge (I assume because the number of effect sizes
> >>> per subsample is even smaller than the number of ES per study).
> >
> >This is probably again related to using struct="UN", which is (probably) not
> >appropriate here.
> >
> >>> Are we correct in concluding that a multi-level model can not be properly
> >>> fit with the data that we have and an alternative approach (RVE or effect
> >>> size aggregation) is better suited to the data?
> >>
> >>You have the wrong syntax here. If you want to specify a multi-level
> >>meta-analysis model in which effecstares nested within studies, use the "/"
> >>character to indicate nesting:
> >>  nested_UN <-rma.mv(ES_corrected, SV, random = ~ IDstudy / IDeffect,
> >>data=MA_dat_raw)
> >>Or if you want to include sub-samples as an intermediate level:
> >>  nested_UN <-rma.mv(ES_corrected, SV, random = ~ IDstudy / IDsubsample /
> >>IDeffect, data=MA_dat_raw)
> >
> >It should be: "random = ~ 1 | IDstudy / IDeffect" or "random = ~ 1 | IDstudy
> >/ IDsubsample / IDeffect".
> >
> >>Both of these will give estimates of average effect size and variance
> >>component estimates. However, the corresponding standard errors of the
> >>average effect sizes are based on the assumption that the entire model is
> >>correctly specified. RVE relaxes that assumption. Thus, the decision to use
> >>RVE or not should be based on a judgement about the plausibility of the
> >>model's assumptions (rather than on whether you can get a model to
> >>converge).
> >>
> >>> 2. If we want to use RVE, would the following model which includes random
> >>> effects at all three levels (effect size, subsample, study) be
> >appropriate
> >>> in combination with clubSandwich package robust coefficient estimates?
> >>> model <-rma.mv(ES_corrected, SV, random =  ~ 1 | IDstudy / IDsubsample/
> >>> IDeffect, data=MA_dat_raw)
> >>> coef_test(model, vcov = "CR2")
> >>> Or should something else be done in order to adequately address the issue
> >>> of effect size dependence?
> >>
> >>This seems fine. One step better would be to consider whether the effect
> >>size estimates within a given sub-sample have correlated sampling errors.
> >>This would be the case, for instance, if the effect sizes are for different
> >>outcome measures (or measures of the same outcome at different points in
> >>time), assessed on the same sub-sample of individual participants. Details
> >>on how to do this can be found here:
> >>https://www.jepusto.com/imputing-covariance-matrices-for-multi-variate-
> >meta-
> >>analysis/
> >>
> >>> 3. The variances for this model are:
> >>>
> >>> Variance Components:
> >>>             estim    sqrt  nlvls  fixed                        factor
> >>>
> >>> sigma^2.1  0.0589  0.2427     20     no                       IDstudy
> >>> sigma^2.2  0.0250  0.1583     53     no           IDstudy/IDsubsample
> >>> sigma^2.3  0.0014  0.0373     69     no  IDstudy/IDsubsample/IDeffect
> >>>
> >>> In other words, there is very little variance at the level of IDeffect,
> >>> after Study and Subsample have been taken into account. The profile
> >>> likelihood plot for sigma^2.3 does, however, appear to peak at the
> >>> corresponding value when “zoomed in” (with xlim=c(0,0.01)).
> >>> Should we consider this a satisfactory model, or is the variance at the
> >>> level of IDeffect too small to be meaningful? Presumably, this has to do
> >>> with the fact that the majority of subsamples (43 out of 53) only
> >>> contribute to the MA with one effect size, for 8 subsamples there are 2
> >ES
> >>> per subsample, and in two instances 5 ESs per subsample.
> >>>
> >>> Would an acceptable alternative model be:
> >>> nested <- rma.mv(ES_corrected, SV, random = ~ 1 | IDstudy/IDeffect,
> >>> data=MA_dat_raw)
> >>> Here, we’ve excluded random effects at the subsample level, because it
> >made
> >>> more sense to include random effects at the level of individual effect
> >>> sizes and the two variables have a substantial overlap. The variances for
> >>> this model seem adequate (and their profile plots look fine, too).
> >>>
> >>> Variance Components:
> >>>             estim    sqrt  nlvls  fixed            factor
> >>>
> >>> sigma^2.1  0.0678  0.2604     20     no           IDstudy
> >>> sigma^2.2  0.0150  0.1223     69     no  IDstudy/IDeffect
> >>>
> >>The nice thing about RVE is that the standard errors for the average effect
> >>are calculated in a way that does not require the correct specification of
> >>the random effects structure. As a result, you should get very similar
> >>standard errors regardless of whether you include random effects for all
> >>three levels or whether you exclude a level. However, the variance
> >>component estimates are still based on an assumption that the model is
> >>correctly specified. I think it would therefore be preferable to use the
> >>model that captures the theoretically relevant levels of variation, so in
> >>this case, all three levels.
> >
> >Agree. I would go with the IDstudy/IDsubsample/IDeffect model.
> >
> >>> 4. Finally, we are also interested in examining the effects of a
> >moderator
> >>> variable which defines different outcomes. So, in cases when one
> >subsample
> >>> produces more than one effect size – sometimes these effect sizes belong
> >to
> >>> the same level of the moderator variable (same outcome under different
> >>> circumstances) and sometimes they belong to different levels of the
> >>> moderator (different outcomes). Theoretically, we would expect “same-
> >level”
> >>> ESs to be more correlated than “different-level” ones, but with the small
> >>> number of subsamples that report more than one ES this seems impossible
> >to
> >>> model. Does the use of clubSandwich robust coefficient already take care
> >of
> >>> this?
> >>
> >>It depends on what you mean by "take care of" this issue. RVE does not
> >>really solve the problem of how to model within- versus between-sample
> >>variation in a predictor, but it does mean that you can be less worried
> >>about getting the variance structure exactly correct. To address the issue
> >>you raise, one thing you could do is include a version of the moderator
> >>that is centered within each study, in addition to the study-level mean of
> >>the moderator. This would let you parse out "same-level" versus
> >>"different-level" variation in the moderator. However, with so few studies
> >>that have more than one level of the moderator, the within-study version of
> >>the predictor will have very little variation and so it will come with a
> >>large standard error.