[R-meta] Covariance-variance matrix when studies share multiple treatment x control comparison

Thu Sep 26 19:44:36 CEST 2019

Dear Wolfgang,

As I mentioned in my previous e-mail, I am having some doubts about the way I have specified my random effects in models, and realize that I should seek your advice on this topic as well.

To explain my data structure and work flow first, I am analyzing the effect of 4 different types of habitat changes on marine food-web interactions. I would first run separate random effect models with each of these 4 habitat change data and  also with all data combined. I would then run mixed effect models with same specified moderators with each of 4 dataset as well as with all studies pooled. I am running the analysis with pooled data here because we want to examine the overall patterns encompassing different changes.

1.My first question would be: if I am pooling these 4 dataset (which do show variable effect patterns) and running random or mixed effect models with pooled data (full.es), should I be accounting for merging these four habitat change groups in full models? For example, by adding creating a random effect variable with these four habitat change categories (say "Habitat.change.type")? Or do you think this is unnecessary and it is still logical to run full models without accounting for merging these datasets.

So my current code  for full model combined all of 4 dataset would be  full.es<-rma.mv(hedged,VCV, method="REML", random = ~ 1 | Study/ID, data=MHF)
then can I change this to full.es<-rma.mv(hedged,VCV, method="REML", random = list( ~ 1 | Study/ID,  1 | Habitat.change.type), data=MHF)?

2. My second question directly results from your previous comments. I have been following https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2018-April/000778.html where you advised to use inner/outer logic with variance structure if there is a big gap between moderator model and subgroup model (removing intercept from moderator model). I have had the same issue as in this post, and I was able to fix this issue by adding struct="DIAG" logic and specifying moderator as inner random effect.

So my  initial code looked like:

> chs.mod<-rma.mv(hedged,VCV, mods=~ factor(Consumer.habitat.specialization), method="REML", random = ~ Consumer.habitat.specialization | Study, data=MHF, struct="DIAG", subset=(!is.na(Consumer.habitat.specialization)))
> summary(chs.mod)
Multivariate Meta-Analysis Model (k = 778; method: REML)
logLik    Deviance         AIC         BIC        AICc
-1533.1822   3066.3645   3074.3645   3092.9811   3074.4163
Variance Components:
  outer factor: Study                           (nlvls = 166)
inner factor: Consumer.habitat.specialization (nlvls = 2)
estim    sqrt  k.lvl  fixed       level
tau^2.1    0.8936  0.9453    615     no  Generalist
tau^2.2    0.9615  0.9806    163     no  Specialist
Test for Residual Heterogeneity:
  QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 2):
  QM(df = 1) = 10.7143, p-val = 0.0011

Model Results:
  estimate      se     zval    pval    ci.lb    ci.ub
intrcpt                                              0.7276  0.0871   8.3549  <.0001   0.5569   0.8982  ***
  factor(Consumer.habitat.specialization)Specialist   -0.5991  0.1830  -3.2733  0.0011  -0.9578  -0.2404   **
  ---
  Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

But, then I add study Id to the model as you've suggested then moderator effects changes completely. This has happend to every single mixed effect models, all of them that
generated highly significant QM when only inner (moderator) and outer (Study) random effects were specified.

> chs.ID.mod<-rma.mv(hedged,VCV, mods=~ factor(Consumer.habitat.specialization), method="REML", random = ~ Consumer.habitat.specialization | Study/Id, data=MHF, struct="DIAG", subset=(!is.na(Consumer.habitat.specialization)))
> summary(chs.ID.mod)
Multivariate Meta-Analysis Model (k = 778; method: REML)
logLik    Deviance         AIC         BIC        AICc
-1280.5447   2561.0894   2571.0894   2594.3601   2571.1673
Variance Components:
  estim    sqrt  nlvls  fixed                                    factor
sigma^2.1  0.5299  0.7279      2     no           Consumer.habitat.specialization
sigma^2.2  0.8052  0.8973    183     no     Consumer.habitat.specialization/Study
sigma^2.3  0.4899  0.6999    778     no  Consumer.habitat.specialization/Study/Id
Test for Residual Heterogeneity:
  QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 2):
  QM(df = 1) = 0.5276, p-val = 0.4676
Model Results:
  estimate      se     zval    pval    ci.lb   ci.ub
intrcpt                                              0.8221  0.7336   1.1206  0.2625  -0.6158  2.2599
factor(Consumer.habitat.specialization)Specialist   -0.7598  1.0460  -0.7263  0.4676  -2.8099  1.2904
---
  Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

But if I remove this inner logic (moderator) from random effects, results are back to being highly significant. So adding this inner logic seem to really change our results.

> chs.ID2.mod<-rma.mv(hedged,VCV, mods=~ factor(Consumer.habitat.specialization), method="REML", random = ~ 1 | Study/Id, data=MHF, struct="DIAG", subset=(!is.na(Consumer.habitat.specialization)))
> summary(chs.ID2.mod)
Multivariate Meta-Analysis Model (k = 778; method: REML)
    logLik    Deviance         AIC         BIC        AICc
-1283.6031   2567.2063   2575.2063   2593.8229   2575.2582
Variance Components:

            estim    sqrt  nlvls  fixed    factor
sigma^2.1  0.7648  0.8745    166     no     Study
sigma^2.2  0.5117  0.7153    778     no  Study/Id
Test for Residual Heterogeneity:
QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 2):
QM(df = 1) = 39.0995, p-val < .0001
Model Results:
                                                   estimate      se     zval    pval    ci.lb    ci.ub
intrcpt                                              0.8429  0.0868   9.7165  <.0001   0.6729   1.0130  ***
factor(Consumer.habitat.specialization)Specialist   -0.8873  0.1419  -6.2530  <.0001  -1.1654  -0.6092  ***
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Same thing happens for my subgroup models. Adding ID to current inner/outer model completely nullify what was previously super strong effects. However, if I remove inner logic and just use "random = ~ 1 | Study/Id" results become somewhat more similar although magnitude of effect size actually increased.

# Original: random = ~ Consumer.habitat.specialization | Study
> chs.es<-rma.mv(hedged,VCV, mods=~ factor(Consumer.habitat.specialization)-1, method="REML", random = ~ Consumer.habitat.specialization | Study, data=MHF, struct="DIAG", subset=(!is.na(Consumer.habitat.specialization)))
> summary(chs.es)
Multivariate Meta-Analysis Model (k = 778; method: REML
    logLik    Deviance         AIC         BIC        AICc
-1533.1822   3066.3645   3074.3645   3092.9811   3074.4163
Variance Components:
outer factor: Study                           (nlvls = 166)
inner factor: Consumer.habitat.specialization (nlvls = 2)

            estim    sqrt  k.lvl  fixed       level
tau^2.1    0.8936  0.9453    615     no  Generalist
tau^2.2    0.9615  0.9806    163     no  Specialist
Test for Residual Heterogeneity:
QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 1:2):
QM(df = 2) = 70.4413, p-val < .0001
Model Results:
                                                   estimate      se    zval    pval    ci.lb   ci.ub
factor(Consumer.habitat.specialization)Generalist    0.7276  0.0871  8.3549  <.0001   0.5569  0.8982  ***
factor(Consumer.habitat.specialization)Specialist    0.1285  0.1610  0.7981  0.4248  -0.1870  0.4440
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

# Add ID to original: random = ~ Consumer.habitat.specialization | Study/Id
> chs.ID.es<-rma.mv(hedged,VCV, mods=~ factor(Consumer.habitat.specialization)-1, method="REML", random = ~ Consumer.habitat.specialization | Study/Id, data=MHF, struct="DIAG", subset=(!is.na(Consumer.habitat.specialization)))
> summary(chs.ID.es)
Multivariate Meta-Analysis Model (k = 778; method: REML)
    logLik    Deviance         AIC         BIC        AICc
-1280.5447   2561.0894   2571.0894   2594.3601   2571.1673
Variance Components:
            estim    sqrt  nlvls  fixed                                    factor
sigma^2.1  0.5299  0.7279      2     no           Consumer.habitat.specialization
sigma^2.2  0.8052  0.8973    183     no     Consumer.habitat.specialization/Study
sigma^2.3  0.4899  0.6999    778     no  Consumer.habitat.specialization/Study/Id
Test for Residual Heterogeneity:
QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 1:2):
QM(df = 2) = 1.2627, p-val = 0.5319
Model Results:
                                                   estimate      se    zval    pval    ci.lb   ci.ub
factor(Consumer.habitat.specialization)Generalist    0.8221  0.7336  1.1206  0.2625  -0.6158  2.2599
factor(Consumer.habitat.specialization)Specialist    0.0623  0.7456  0.0835  0.9334  -1.3991  1.5237
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

# Add ID but remove inner logic and variance structure specification: random = ~ 1 | Study/ID
> chs.ID2.es<-rma.mv(hedged,VCV, mods=~ factor(Consumer.habitat.specialization)-1, method="REML", random = ~ 1 | Study/Id, data=MHF, subset=(!is.na(Consumer.habitat.specialization)))
> summary(chs.ID2.es)
Multivariate Meta-Analysis Model (k = 778; method: REML)
    log Lik    Deviance         AIC         BIC        AICc
-1283.6031   2567.2063   2575.2063   2593.8229   2575.2582
Variance Components:
            estim    sqrt  nlvls  fixed    factor
sigma^2.1  0.7648  0.8745    166     no     Study
sigma^2.2  0.5117  0.7153    778     no  Study/Id
Test for Residual Heterogeneity:
QE(df = 776) = 3497.6681, p-val < .0001
Test of Moderators (coefficient(s) 1:2):
QM(df = 2) = 103.0703, p-val < .0001
Model Results:
                                                   estimate      se     zval    pval    ci.lb   ci.ub
factor(Consumer.habitat.specialization)Generalist    0.8429  0.0868   9.7165  <.0001   0.6729  1.0130  ***
factor(Consumer.habitat.specialization)Specialist   -0.0444  0.1370  -0.3237  0.7461  -0.3129  0.2242
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Adding ID to my RANDOM effect models without inner/outer structure does not cause any problem, and actually makes indepdently calculated effect sizes even more significant than before.
I am now wondering what is the cause of this conflicting results and how I can address this?

Third, do you have an alternative suggestions to calculate mean effect size and 95%CIs for each subgroups other than the way I did using intercept term removal?
My issues has been unless I specify the variance components, some categorical effects (especially ones with fewer sample sizes) changed drastically (both in sign and magnitudes) that
it contradicted mean effects generated by independent random effect model. For comparative purpose in figures, should I just calculate effect sizes independently or do you think is there a way
to avoid this subgroup vs. moderator model gaps, especially considering adding inner/outer variance components, Study, and ID altogether seem to cause issues here...

I apologize for this extensive letter, but I was quite surprised with this recent development and I had to make further inquiry.

Thank you very much Wolfgang and I sincerely hope to hear back from you.

Best wishes,
JU

________________________________
From: Ju Lee <juhyung2 using stanford.edu>
Sent: Thursday, September 26, 2019 9:05 AM
To: James Pustejovsky <jepusto using gmail.com>; Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
Cc: r-sig-meta-analysis using r-project.org <r-sig-meta-analysis using r-project.org>
Subject: Re: Covariance-variance matrix when studies share multiple treatment x control comparison

Dear Wolfgang, James

Thank you both for your considerate suggestions.

First of all, I would like to clarify that I will be sending out another thread related to Wolfgang's comment about adding study ID to random factors as it has caused some major issues with my current analysis and I would really like second feedbacks on this matter (on my very next e-mail).

Related to James's suggestion, I will follow up on your newly published paper and apply this to my code. Since I am using variance-covariance matrix instead of normal variance (to account for shared control/treatment groups) and trying to incorporate this to modified egger's test, I am wondering if means I should be creating a diagonal matrix constituted of sqrt(1 / n1 + 1 / n2) for all inter-dependent effect sizes?

Best regards,
JU
________________________________
From: James Pustejovsky <jepusto using gmail.com>
Sent: Thursday, September 26, 2019 8:26 AM
To: Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl>
Cc: Ju Lee <juhyung2 using stanford.edu>; r-sig-meta-analysis using r-project.org <r-sig-meta-analysis using r-project.org>
Subject: Re: Covariance-variance matrix when studies share multiple treatment x control comparison

Ju,

Following up on Wolfgang's comment: yes, adding a measure of precision as a predictor in the multi-level/multi-variate meta-regression model should work. Dr. Belen Fernandez-Castilla has a recent paper that reports a simulation study evaluating this approach. See

Fern�ndez-Castilla, B., Declercq, L., Jamshidi, L., Beretvas, S. N., Onghena, P., & Van den Noortgate, W. (2019). Detecting selection bias in meta-analyses with multiple outcomes: A simulation study. The Journal of Experimental Education, 1�20.

However, for standardized mean differences based on simple between-group comparisons, it is better to use sqrt(1 / n1 + 1 / n2) as the measure of precision, rather than using the usual SE of d. The reason is that the SE of d is naturally correlated with d even in the absence of selective reporting, and so the type I error rate of Egger's regression test is artificially inflated if the SE is used as the predictor. Using the modified predictor as given above fixes this issue and yields a correctly calibrated test. For all the gory details, see Pustejovsky & Rodgers (2019; https://doi.org/10.1002/jrsm.1332).

It's also possible to combine all of the above with robust variance estimation, or to use a simplified model plus robust variance estimation to account for dependency between effect sizes from the same study. Melissa Rodgers and I have a working paper showing that this approach works well for meta-analyses that include studies with multiple correlated outcomes. We will be posting a pre-print of the paper soon, and I can share it on the listserv when it's available.

James

On Thu, Sep 26, 2019 at 3:12 AM Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl<mailto:wolfgang.viechtbauer using maastrichtuniversity.nl>> wrote:
Hi Ju,

Glad to hear that you are making progress. Construction of the V matrix can be a rather tedious process and often requires quite a bit of manual work.

I have little interested in generalizing fsn() for cases where V is not diagonal, because fsn() is more of interest for historical reasons, not something I would generally use in applied work.

However, the 'Egger regression' test can be easily generalized to rma.mv<http://rma.mv>() models. Simply include a measure of the precision (e.g., the standard error) of the estimates in your model as a predictor/moderator and then you have essentially a multilevel/multivariate version thereof (you would then look at the test of the coefficient for the measure of precision, not the intercept).

I also recently heard a talk by Melissa Rodgers and James Pustejovsky (who is a frequent contributor to this mailing list) on some work in this area. Maybe he can chime in here.

Best,
Wolfgang

-----Original Message-----
From: Ju Lee [mailto:juhyung2 using stanford.edu<mailto:juhyung2 using stanford.edu>]
Sent: Thursday, 26 September, 2019 8:13
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis using r-project.org<mailto:r-sig-meta-analysis using r-project.org>
Subject: Re: Covariance-variance matrix when studies share multiple treatment x control comparison

Dear Wolfgang,

I deeply appreciate your time looking into this issue, and this has been immensely helpful.
I was able to incorporate all possible inter-dependence among effect sizes by adding different layers of non-independence to our dataframe.

I manually calculated hedges'd based on based on Hedges and Olkin (1985), and it generates exactly same value as hedges' g in escalc() "SMD" function. So hopefully I am doing everything right using the equation we've discussed earlier.

I have been also wondering if it is possible to account of this variance-covariance structure that I've constructed when running publication bias analysis, for example, when using fsn() function or modified egger's regression test (looking at intercept term of residual ~ precision meta-regression using rma.mv<http://rma.mv>). I had no luck so far finding information on this, and I would appreciate if you have any suggestions related to this

Thank you for all of your valuable helps!
Best regards,
JU

	[[alternative HTML version deleted]]