[R-meta] How to conduct a meta-analysis on multiple-treatment studies with a repeated measure designs?
Gram, Gil (IITA)
G@Gr@m @ending from cgi@r@org
Fri Jun 1 15:09:37 CEST 2018
Hi Wolfgang,
Thank you for your reply.
Even after reading Gleser & Olkin (2009), Berkey et al (1998), and Koenraad’s example, I’m still confused and think my case is quite complex. I also think there might be format requirements of the data that I haven’t found out about?
My example data set gives you my dataset format, which is basically long-format for all variables.
dat1 = data.table(site = c('Chinyika', 'Chinyika', 'Chinyika', 'Chinyika', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2',
'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2'),
idGroup = c(34,34,34,34,270,270,270,270,270,270,270,270,270,271,271,271,271,271,271,271,271,271),
time = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
treatment = c('Control', 'MR', 'OR', 'ORMR', 'Control', 'MR', 'MR', 'OR', 'OR', 'ORMR', 'ORMR', 'ORMR', 'ORMR', 'Control', 'MR', 'MR', 'OR', 'OR', 'ORMR', 'ORMR', 'ORMR', 'ORMR'),
orgRate = c(0,0,NA,NA,0,0,0,65,130,65,130,65,130,0,0,0,65,130,65,130,65,130),
minRate = c(0,NA,0,NA,0,40,80,0,0,40,40,80,80,0,40,80,0,0,40,40,80,80),
yield = c(0.75,3.41,1.66,3.61,0.85,1.1,1.5,1.6,2.25,1.8,2.1,2.3,2.5,0.6,1.25,1.7,0.95,1.25,1.4,1.7,1.9,2),
sd = c(0.51,2.04,0.98,2.25,0.15,0.15,0.15,0.18,0.10,0.10,0.13,0.18,0.30,0.05,0.10,0.10,0.10,0.08,0.10,0.05,0.10,0.10),
nRep = c(6,6,6,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3))
First thing is the calculation of sampling variances vi. I think if I want to work with raw means of each group (=yi), on the esclalc documentation page I look in the Outcome Measures for Individual Groups - Measures for Quantitative Variables section. Here, one needs to specify mi, sdi, and ni, which makes me think a long-format works right? So escalc(measure="MN", mi=yield, sdi=sd, ni=nRep, data=dat1). How does the escalc calculate those vi (based on sdi and ni), if it’s not just sdi^2?
According to Gleser & Olkin, for Multiple-Treatment Studies, with a Quantitative Response Variable, the vi for standardized mean differences is calculated as such:
dat$Ni <- unlist(lapply(split(dat, dat$study), function(x) rep(sum(x$n1i) + x$n2i[1], each=nrow(x))))
dat$vi <- with(dat, 1/n1i + 1/n2i + yi^2/(2*Ni))
But here the m1i and m2i are the treatments in columns (wide-format). Should I use this structure for my dataset as well? Something like dat2 = dcast(dat1, site+idGroup+time+orgRate+minRate ~ treatment, value.var = c('yield', 'sd', 'nRep'))
How do these functions apply to my example dataset, and how do I calculate the vcov-matrix?
Second, I tried out your recommended syntax and the model did converge (after >4days running), I used the raw means for yi and SE values (I know it’s wrong, but wanted to try it out). Is it problematic the rho values show significant correlations between the treatments?
> rma.mv(yield, SEfromVARCOM, mods = ~ treatment, random = list(~ treatment | site, ~ treatment | site.time), struct="UN", data=dtPro)
Multivariate Meta-Analysis Model (k = 5226; method: REML)
Variance Components:
outer factor: site (nlvls = 76)
inner factor: treatment (nlvls = 4)
estim sqrt k.lvl fixed level
tau^2.1 0.9095 0.9537 523 no Control
tau^2.2 2.0533 1.4329 663 no MR
tau^2.3 1.9773 1.4062 1900 no OR
tau^2.4 2.1060 1.4512 2140 no ORMR
rho.Cntr rho.MR rho.OR rho.ORMR Cntr MR OR ORMR
Control 1 0.8852 0.8738 0.8914 - no no no
MR 0.8852 1 0.8734 0.9297 74 - no no
OR 0.8738 0.8734 1 0.9334 74 76 - no
ORMR 0.8914 0.9297 0.9334 1 74 76 76 -
outer factor: site.time (nlvls = 350)
inner factor: treatment (nlvls = 4)
estim sqrt k.lvl fixed level
gamma^2.1 1.1188 1.0577 523 no Control
gamma^2.2 1.7876 1.3370 663 no MR
gamma^2.3 1.4307 1.1961 1900 no OR
gamma^2.4 2.0906 1.4459 2140 no ORMR
phi.Cntr phi.MR phi.OR phi.ORMR Cntr MR OR ORMR
Control 1 0.8505 0.8989 0.7706 - no no no
MR 0.8505 1 0.8255 0.9474 346 - no no
OR 0.8989 0.8255 1 0.8670 344 348 - no
ORMR 0.7706 0.9474 0.8670 1 344 348 348 -
Test for Residual Heterogeneity:
QE(df = 5222) = 114363.8935, p-val < .0001
Test of Moderators (coefficients 2:4):
QM(df = 3) = 237.1088, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt 1.7147 0.1390 12.3324 <.0001 1.4422 1.9872 ***
treatmentMR 1.4102 0.1101 12.8074 <.0001 1.1944 1.6260 ***
treatmentOR 0.9323 0.1050 8.8751 <.0001 0.7264 1.1382 ***
treatmentORMR 1.7358 0.1153 15.0537 <.0001 1.5098 1.9618 ***
Thank you Wolfgang and co!
Gil
On 24May, 18, at 22:19, Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer using maastrichtuniversity.nl<mailto:wolfgang.viechtbauer using maastrichtuniversity.nl>> wrote:
Dear Gram,
The calc.v() function below doesn't make sense here. It is for standardized mean differences and it only applies when multiple SMD values were computed by contrasting various treatments against a common control group. You are working with means and are not computing contrasts.
Also, rma.mv() does not handle things like 'random = ~ treatment | site/time'. Please install the 'devel' version of the metafor package where attempts to use such syntax are properly flagged. If you want to fit this, then you have to do:
dat1$site.time <- interaction(dat1$site, dat1$time)
rma.mv(..., random = list(~ treatment | site, ~ treatment | site.time), struct="UN", ...)
Not sure whether such a model would converge. You are trying to estimate twice a 4x4 var-cov matrix, so that's 20 parameters in total. Unless you have tons of data (and a lot of patience), this isn't going to work.
Yes, the tau^2 values tell you about yield variability.
As for weights: By default, the model above will have a weight *matrix* that is the inverse of the model-implied marginal var-cov matrix of the observed outcomes. Unless you know what this means, I would suggest not to start trying to use custom weights (or a custom weight matrix). If you specify a model that is a somewhat sensible approximation to the underlying data generating mechanism, then there is also no need to mess with the weights in the first place (and if the model is not a sensible approximation, then one should probably find a better model).
Finally, yes, one might need to construct a V matrix here that properly reflects the covariances among the sampling errors. However, I have no idea how that would be done in this case. Plots of land are something rather different than people and I have very little understanding of how these types of studies are run, so I cannot really say how covariances among sampling errors could arise and how one would compute them.
Instead, you could also browse through the archives and look for discussions around the cluster-robust inference approach. With that approach, you could get appropriate estimates of the standard errors of the fixed effects, although then I would be very cautious about interpreting the tau^2 values.
Best,
Wolfgang
-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Gram, Gil (IITA)
Sent: Thursday, 24 May, 2018 10:53
To: r-sig-meta-analysis using r-project.org<mailto:r-sig-meta-analysis using r-project.org>
Subject: Re: [R-meta] How to conduct a meta-analysis on multiple-treatment studies with a repeated measure designs?
Dear all,
I found the discussion of Koenraad, Gabriele, and Wolfgang very interesting because it applies to my case as well.
However, it didnt answer all of my questions. Im also trying to perform a multivariate mixed effect model with repeated measures over time with rma.mv. Hopefully you can shed some light on it...
Here is a simplified exerpt of my data:
dat1 = data.table(site = c('Chinyika', 'Chinyika', 'Chinyika', 'Chinyika', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2',
'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2', 'Dodoma2'),
time = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2),
treatment = c('Control', 'MR', 'OR', 'ORMR', 'Control', 'MR', 'MR', 'OR', 'OR', 'ORMR', 'ORMR', 'ORMR', 'ORMR', 'Control', 'MR', 'MR', 'OR', 'OR', 'ORMR', 'ORMR', 'ORMR', 'ORMR'),
idGroup = c(34,34,34,34,270,270,270,270,270,270,270,270,270,271,271,271,271,271,271,271,271,271),
yield = c(0.75,3.41,1.66,3.61,0.85,1.1,1.5,1.6,2.25,1.8,2.1,2.3,2.5,0.6,1.25,1.7,0.95,1.25,1.4,1.7,1.9,2),
sd = c(0.51,2.04,0.98,2.25,0.15,0.15,0.15,0.18,0.10,0.10,0.13,0.18,0.30,0.05,0.10,0.10,0.10,0.08,0.10,0.05,0.10,0.10),
nRep = c(6,6,6,6,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3),
orgRate = c(0,0,NA,NA,0,0,0,65,130,65,130,65,130,0,0,0,65,130,65,130,65,130),
minRate = c(0,NA,0,NA,0,40,80,0,0,40,40,80,80,0,40,80,0,0,40,40,80,80))
I have studies with one or more sites, over 1 or more years, and with essentially 4 fertiliser treatments. But in practice more treatments are possible if
different fertiliser rates are used (as shown in the example data set, by the organic Rate and mineral Rate columns).
According to https://rdrr.io/cran/metafor/man/escalc.html (Measures for Quantitative Variables) I use the escalc to calculate the sampling variances (vi)
and use measure = 'MN' because I want the raw mean of each group to be the observed outcome for the meta-analysis
dat = escalc(measure="MN", mi=yield, sdi=sd, ni=nRep, data=dat1)
Then following the example of Gleser & Olkin (2009) and Gabriele/Wolfgang, I should compute a var-covar matrix in order to account for the multiple (correlated) treatments.
How should I adapt the function below for a varying number of treatments (minimum 4, maximum 9 for dat1)? is that even possible?
calc.v <- function(x) {
v <- matrix(1/x$n2i[1] + outer(x$yi, x$yi, "*")/(2*x$Ni[1]), nrow=nrow(x), ncol=nrow(x))
diag(v) <- x$vi
v
}
V <- bldiag(lapply(split(dat, dat$idGroup), calc.v))
V
Thus, the model I think I need is the following:
rma.mv(yi, vi, mods = ~ treatment, random = ~ treatment | site/time, struct="UN", data=dat)
but with vi that should be replaced with V for multiple (correlated) treatments
struct = "UN" because random effects are expected to have different variances for each outcome and might be correlated.
random = ~ treatment | site/time, because the effect of treatment is likely to be depeninding on site and will differ over time (random slope),
and to account for correlations within sites, but also whithin time within sites.
However, I understood from Wolfgang's reply that adding random effects wont be sufficient, but that I should account for covariances
in the sampling errors due to the repeated measurements as wel. But how to do that? integrate a new V matrix?
Weighting: so far we have used sample errors in our model, that I think rma.mv is using as weights (?), but how can we tell the model to also
account for number of years or study errors in the weighting?
lastly, the output interpretation: I'm interested in 2 things, the (fixed) effect of the treatments, and the variance for each treatment.
When I run the model above without time in the random structure, I get a variance for each treatment:
Multivariate Meta-Analysis Model (k = 22; method: REML)
Variance Components:
outer factor: site (nlvls = 2)
inner factor: treatment (nlvls = 4)
estim sqrt k.lvl fixed level
tau^2.1 0.0070 0.0840 3 no Control
tau^2.2 1.7843 1.3358 5 no MR
tau^2.3 0.0183 0.1352 5 no OR
tau^2.4 1.4943 1.2224 9 no ORMR
rho.Cntr rho.MR rho.OR rho.ORMR Cntr MR OR ORMR
Control 1 1.0000 1.0000 1.0000 - no no no
MR 1.0000 1 1.0000 1.0000 2 - no no
OR 1.0000 1.0000 1 1.0000 2 2 - no
ORMR 1.0000 1.0000 1.0000 1 2 2 2 -
Test for Residual Heterogeneity:
QE(df = 18) = 482.3498, p-val < .0001
Test of Moderators (coefficient(s) 2:4):
QM(df = 3) = 532.2245, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt 0.6816 0.0677 10.0681 <.0001 0.5489 0.8143 ***
treatmentMR 1.6370 0.9284 1.7633 0.0778 -0.1825 3.4566 .
treatmentOR 0.8680 0.0551 15.7403 <.0001 0.7599 0.9761 ***
treatmentORMR 1.9295 0.8442 2.2855 0.0223 0.2748 3.5841 *
I think this can be interpreted as the treatment Control has the lowest yield variability (tau^2.1), and OR the highest (tau^2.2).
Meaning that the yield under the MR treatment is (not significantly, p 0.05) higher, but less reliabel than the control (?)
when I run the model with time in the random structure, I don't get a variance for each treatment. So how do I assess then yield variability?
Multivariate Meta-Analysis Model (k = 22; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0763 0.2763 4 no treatment
sigma^2.2 0.0000 0.0001 8 no treatment/site
sigma^2.3 0.1288 0.3589 12 no treatment/site/time
Test for Residual Heterogeneity:
QE(df = 18) = 482.3498, p-val < .0001
Test of Moderators (coefficient(s) 2:4):
QM(df = 3) = 6.4643, p-val = 0.0911
Model Results:
estimate se zval pval ci.lb ci.ub
intrcpt 0.7297 0.3525 2.0700 0.0385 0.0388 1.4206 *
treatmentMR 0.8082 0.5113 1.5805 0.1140 -0.1940 1.8104
treatmentOR 0.8916 0.5039 1.7694 0.0768 -0.0960 1.8792 .
treatmentORMR 1.2450 0.5113 2.4348 0.0149 0.2428 2.2472 *
Any help is greatly appreciated! thanks…
[[alternative HTML version deleted]]
More information about the R-sig-meta-analysis
mailing list