[R-meta] Choice of 'struct' in rma.mv inner| outer model

Fri Apr 24 11:47:01 CEST 2020

Dear Divya,

Just a note at the beginning:

It's great that you provide data / code to run your example. But here is a little suggestion on how to make it even easier for those who would like to replicate your example. Right now, I needed to download the csv file, save it to a directory, then I still had to write the 'dt <- read.csv("Example.csv")' line to import the data into R, then I needed to load metafor, and then I could run ' res <- rma.mv(...)', which you wrapped in ``, so I had to remove those tick marks after copy-pasting your code. It would be so much easier for me (and also others) if you provide code in your post that I can directly copy-paste into R to replicate your example. One way of doing this is using the dput() function. If you have a data frame in R, use dput(dt), and that creates R code that can be used to recreate this object. Then provide all code - including loading packages - and don't put code between ``. Here is what this would look like for your case:

##############################################

library(metafor)

dt <- structure(list(Dataset = structure(c(1L, 3L, 3L, 4L, 4L, 5L, 
6L, 6L, 7L, 7L, 8L, 9L, 10L, 7L, 2L), .Label = c("Dataset1", 
"Dataset10", "Dataset2", "Dataset3", "Dataset4", "Dataset5", 
"Dataset6", "Dataset7", "Dataset8", "Dataset9"), class = "factor"), 
    Cohort = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 4L, 
    4L, 5L, 1L, 6L, 4L, 1L), .Label = c("Cohort1", "Cohort2", 
    "Cohort3", "Cohort4", "Cohort5", "Cohort6"), class = "factor"), 
    Effect_size = c(-0.245876357813553, -0.0360171660890709, 
    -0.16077898453692, -0.00771854549268499, -0.00771854549268499, 
    0.607691179338256, 0.00870359628248281, 0, 0.966934742778138, 
    2.25128958030552, 0.00346142286436699, -0.0304355733688647, 
    -0.00129982930640761, 3.99446794900229, 1.67893037734415), 
    Standard_error = c(0.776725358171479, 0.209012654351505, 
    0.859693331055691, 0.0964710555089026, 0.0964710555089026, 
    0.912217685998336, 0.121011687812713, 5.43403934888512, 0.845820951946978, 
    1.16744454770699, 0.0706378320889943, 0.173382641268852, 
    0.0998545272374674, 1.34914215714371, 0.484285174395154), 
    Pvalue = c(0.384012823927408, 0.35983846771426, 0.366304900424426, 
    0.159474757999654, 0.159474757999654, 0.395421030680653, 
    0.331243284921557, 1, 0.26068122210546, 0.0627221584358117, 
    0.581936871705656, 0.240184581770889, 0.802662665505698, 
    0.005425952642806, 0.000972309152346685), Padjust = c(0.612842383391001, 
    0.91559247103096, NA, 0.901326220358046, 0.901326220358046, 
    0.679170329181513, 0.72631204711653, NA, 0.752376063104347, 
    0.579876736229857, 0.978894224573193, 0.980881278252737, 
    0.994197508120265, 0.357110887120397, 0.00275612262889154
    )), class = "data.frame", row.names = c(NA, -15L))
dt

res <- rma.mv(Effect_size, (Standard_error)^2, data = dt,
              random = ~ Dataset | Cohort, method="REML",
              control = list(optimizer="optim", optmethod="Nelder-Mead"))
res

##############################################

Now me, or somebody else, can just copy-paste these exact lines directly from the post into R, and run the example. At least for me, this is much more convenient.

Now to your actual question:

Terms like '~ inner | outer' are used in rma.mv() to add correlated random effects to the model. In particular, random effects for rows where the levels of 'outer' are different are independent, but if the level of 'outer' is the same for two rows, then the random effects are assumed to be correlated. For example, if multiple outcomes were measured within a study and hence multiple effects are calculated based on these outcomes, then I would want to account for the dependency between those effects (i.e., 'outer' would then be an id variable to indicate the study and 'inner' an id variable to indicate different effects within the studies).

As a sidenote: Strictly speaking, I should do two things in this scenario: Compute the covariance between the sampling errors of the two effects and model the correlation between the true effects with such a random effect. An example that illustrates the former (for standardized mean differences) is provided here:

http://www.metafor-project.org/doku.php/analyses:gleser2009#multiple-endpoint_studies

Note that the equation/code being used there is specific to SMDs and would need to be changed for other effect size measures.

An example where both of these aspects are taken into consideration is provided here:

http://www.metafor-project.org/doku.php/analyses:berkey1998

Since two different effects are being measured in this example - and we might not want to assume that the amount of heterogeneity is the same for both outcomes - struct="UN" is being used here.

If multiple effects within a study are based on different subjects (e.g., one effect is computed for male and another effect for female subjects), then there is no covariance between the sampling errors, but one would still want to account for possible dependency in the underlying true effects with a random effect. Examples of this kind of application are provided here:

http://www.metafor-project.org/doku.php/analyses:konstantopoulos2011#multivariate_parameterization_of_the_model

and

http://www.metafor-project.org/doku.php/analyses:vanhouwelingen2002#bivariate_approach

There are other cases where '~ inner | outer' terms might be relevant, for example when the same effect was computed at multiple timepoints. Here are examples for this:

https://wviechtb.github.io/metafor/reference/dat.fine1993.html
https://wviechtb.github.io/metafor/reference/dat.ishak2007.html

Here, the correlation between the random effects within a study are assumed to have an autocorrelation structure -- hence the use of different 'struct' options.

If there is some kind of spatial configuration for the effects and this is deemed relevant, then could also consider adding correlated random effects to the model where the correlation reflects spatial correlation. I don't have an example for this, but see help(rma.mv) and search for "spatial correlation structures".

In the end, the use of correlated random effects in the context of a meta-analysis is not fundamentally different than how correlated random effects are used for analyzing primary data. The main difference is that the dependent variable is an effect size measure, while in primary data we have the raw measurements of some dependent variable.

As for your example: I don't know what "Cohort" and "Dataset" represent, so I cannot comment on that. With respect to the number of levels: In principle, it does not matter if the number of 'inner' levels differs across 'outer' levels (that's also the case for the Konstantopoulos (2011) example).

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org]
>On Behalf Of Divya Ravichandar
>Sent: Wednesday, 22 April, 2020 18:00
>To: r-sig-meta-analysis using r-project.org
>Subject: [R-meta] Choice of 'struct' in rma.mv inner| outer model
>
>ATTACHMENT(S) REMOVED: Example.csv
>
>Hi all
>
>I would like to learn what motivates the choice of  the variance structure
>(struct parameter) in an inner|outer model set up in rma.mv. An example of
>my test data is attached and I run rma.mv on my test data (dt) as below. In
>my example below, some outer levels have only 1 inner level while other have
>multiple inner levels. Any input on what should be considered when choosing
>the model structure would be very helpful.
>
>`res <- rma.mv(Effect_size, (Standard_error)^2, data = dt,
>              random = ~ Dataset | Cohort, method="REML",
>              control = list(optimizer="optim", optmethod="Nelder-Mead"))`
>
>--
>Divya Ravichandar
>Scientist
>Second Genome