[R-meta] Multilevel with dataset where lines are treatment means
Martineau, Roger
Roger.Martineau at AGR.GC.CA
Wed Sep 27 19:17:49 CEST 2017
Dear Wolfgang,
Thanks for your reply. I agree with you regarding question #2 but I would like to add more information to the simulated data set regarding the reply to question #1.
I change the name of some variables:
dat <- data.frame(
publication=c(rep(c("A","B","C"),each=4),
rep(c("D","E"),each=8)),
experiment=c(rep(c("A1","A2","B1","B2"),each=2),
rep(c("C1","D1","D2","E1","E2"),each=4)),
milk=c(10,18,5,25,13,48,7,25,6,13,20,35,65,45,30,
15,55,45,30,10,45,40,35,30,35,26,18,8),
SEM=c(rep(c(5,6,7,3),each=2),
rep(c(7,6,5,6,5),each=4)),
intake=c(5,10,25,45,8,30,5,35,5,10,15,20,10,20,25,
30,20,30,40,50,10,15,20,25,5,6,9,12),
type.of.studies=c(rep("casein studies",times=12),
rep("feeding trials",times=16))
)
dat$ID <- 1:nrow(dat) # adding row numbers
dat <- dat[,c(7,6,1:5)] # ordering columns
dat
Data were aggregated from 2 independent sets of studies (i.e. type of studies = casein studies or feeding trials). Each line (n = 28) includes a treatment mean (milk) and its SEM. Treatment means for milk yield were collected from 5 publications (i.e., A, B, C, D, E), each publication reporting 1 or more experiments (e.g., A1 and A2). Each experiment includes at least 2 treatment means.
I used a multilevel model because experiments are nested within publications. For this meta-analysis, the effect sizes are raw means and vi = SEM^2. The moderator is "intake". This simulated data set contains only 2 or 3 publications for each type of studies but there is >30 publications for each type of studies in the real data set.
The question of interest is: Does the relationship between milk yield and intake differ between casein studies and feeding trials ?
1st question: Is it possible to impute a block-diagonal covariance matrix with the "impute_covariance_matrix" function (in a data set with raw means) ???
2nd question: Using 'random = ~1|publication/experiment/line' would be OK if "line" contained an effect size, e.g. mean difference. However, I am not sure it is OK if "line" contains a raw mean ???
3rd question: Using 'random = ~factor(type.of.studies )|publication/experiment' in the model would allow the amount of residual heterogeneity to be different in each subset of studies (i.e., casein studies or feeding trials). Is it OK ???
4th question: After centering "intake", I would test the next model and use the robust function because I didn't impute a block-diagonal covariance matrix. Is it OK ???
dat <- within(dat, {
cintake <- intake - mean(intake,na.rm = TRUE)
}
)
res <- rma.mv(milk ~ cintake*factor(type.of.studies), SEM^2,
random = ~ factor(type.of.studies)|publication/experiment, data=dat, digits=4)
summary(res)
robust(res,cluster=dat$publication)
Thanks in advance,
Roger ☺
-----Message d'origine-----
De : Viechtbauer Wolfgang (SP) [mailto:wolfgang.viechtbauer at maastrichtuniversity.nl]
Envoyé : 23 septembre 2017 17:42
À : Martineau, Roger; r-sig-meta-analysis at r-project.org
Objet : RE: Multilevel with dataset where lines are treatment means
Regarding question 1:
Generally speaking, you want to use 'random = ~1|experiment/study/line'. The standard random effects model includes random effects for each estimate/row. Multilevel models are an extension of this, where we add random effects for higher level clustering levels. Those are *added* to the model -- they don't replace the random effects per estimate/row. See also:
http://www.metafor-project.org/doku.php/analyses:konstantopoulos2011#a_common_mistake_in_the_three-level_model
Using 'random = ~1|experiment/study' would assume that true outcomes are homogeneous within each level of 'study within experiment'. That may not be the case, so I would always use (or at least start with) 'random = ~1|experiment/study/line'.
Regarding question 2:
I don't think there is any general consensus on how to deal with outliers/influential estimates/clusters, especially in more complex models. But I think most people would agree that we should not just delete them (unless they are clearly data entry/coding errors).
And yes, you can use 'cluster=dat$study' (or any other higher level clustering variables, such as dat$experiment) in rstudent() and cooks.distance() for 'rma.mv' objects. Again, there isn't one best strategy here. A particular estimate could be an outlier/influential, an entire study could be, or even an whole experiment. You might as well examine all of those possibilities.
Best,
Wolfgang
-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Martineau, Roger
Sent: Thursday, 14 September, 2017 23:08
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Multilevel with dataset where lines are treatment means
Hi all,
I have a large data set to analyze and I want to get my model (and modus operandi) right before publication. Therefore I will provide a fictive example much smaller (n=28) but having a similar structure (i.e., experiment/study).
Thanks in advance,
Roger ☺
The fictive data are:
dat <- data.frame(
line=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28),
experiment=c("A","A","A","A","B","B","B","B","C","C","C","C",
"D","D","D","D","D","D","D","D","E","E","E","E","E","E","E","E"),
study=c("A1","A1","A2","A2","B1","B1","B2","B2","C1","C1","C1","C1",
"D1","D1","D1","D1","D2","D2","D2","D2","E1","E1","E1","E1","E2","E2","E2","E2"),
outcome=c(10,18,5,25,13,48,7,25,6,13,20,35,65,45,30,15,55,45,30,10,45,40,35,30,35,26,18,8),
SEM=c(5,5,6,6,7,7,3,3,7,7,7,7,6,6,6,6,5,5,5,5,6,6,6,6,5,5,5,5),
moderator1=c(5,10,25,45,8,30,5,35,5,10,15,20,10,20,25,30,20,30,40,50,10,15,20,25,5,6,9,12),
moderator2=c("cat1","cat1","cat1","cat1","cat1","cat1","cat1","cat1",
"cat1","cat1","cat1","cat1","cat2","cat2","cat2","cat2",
"cat2","cat2","cat2","cat2","cat2","cat2","cat2","cat2",
"cat2","cat2","cat2","cat2")) dat
str(dat)
In this data set, "experiment" represents different publications (n=5) and one can fin 1 or 2 studies in each publication. Each study reports at least 2 treatment means (i.e., outcome) and up to 4 treatment means per study (see E2). In animal science, milk yield is an example of such an oucome. Indeed SEM of outcome is identical per study.
I included a continuous (moderator1) and a categorical moderator (moderator2). I deliberately set data such as there is an interaction (the outcome increase with increasing values of moderator1 for cat1 and vice-versa for cat2).
I have a couple of questions on the proper way to analyse such a data set reporting treatment means (and associated SEM).
**** Question 1 concerns the correct random argument structure to use:
# centering moderator:
dat <- within(dat, {
cmoderator1 <- moderator1 - mean(moderator1,na.rm = TRUE)
})
# model with 2 options for random argument:
res <- rma.mv(outcome,SEM^2, data=dat,
mods = ~
cmoderator1*factor(moderator2),
random = ~1|experiment/study, # option 1
#random = ~1|experiment/study/line, # option 2
method = "REML")
summary(res)
Results with option 1 and 2 are close but which one is correct ? and why ?
Based on previous posts, I assume that SE and P-values are correctly estimated using the robust function and the upper level cluster (i.e. experiment):
robust(res, cluster=dat$experiment)
In my large data set, I look for outliers and influential cases with studentitized residuals and Cook's distance values.
**** My second question is: is it OK to identify (and remove from the data set) outliers and influential cases on a study basis (i.e., using the cluster argument in the functions) ? or is it better to identify idividual treatment means (by removing the cluster argument) ?
#### .. Student residuals ####
tmp.stud.res <- rstudent(res,
cluster=dat$study, # is this OK ?
progbar=TRUE)
order(-tmp.stud.res$cluster$X2)
tmp.stud.res$cluster[c(xxx)]
#### .. Cook's distance ####
par(mfrow=c(1,1))
tmp.cook <- cooks.distance(res,
cluster=dat$study, # is this OK ?
progbar=TRUE) plot(tmp.cook, type="o", pch=19) which(tmp.cook > xxx)
Indeed, I do the other functions:
### Likelihood profile plots ###
par(mfrow=c(2,1))
profile.rma.mv(res)
### Normal Q-Q Plot
par(mfrow=c(1,1))
qqnorm(rstandard(res)$z, pch=19)
qqline(rstandard(res)$z)
### Funnel plots (publication bias) ###
par(mfrow=c(2,2))
funnel(res, main="Standard Error")
funnel(res, yaxis="vi", main="Sampling Variance") funnel(res, yaxis="seinv", main="Inverse Standard Error") funnel(res, yaxis="vinv", main="Inverse Sampling Variance")
This is the graph which shows the interaction:
#### .. Graph ####
(res <- rma.mv(outcome, SEM^2, data=dat,
mods = ~
factor(moderator2)-1 +
factor(moderator2):cmoderator1,
random = ~1|experiment/study,
method = "REML"))
tmp.coeff_cat <- round(coef(summary(res)), 4) tmp <- ddply(dat, .(moderator2), summarise,
min = min(cmoderator1, na.rm = TRUE),
max = max(cmoderator1, na.rm = TRUE))
f1 <- function(x) { # function for tmp.casein studies
ifelse(x<tmp[1,3]& x>tmp[1,2],
tmp.coeff_cat[1,1] +
(tmp.coeff_cat[3,1]) *x,NA)}
f2 <- function(x) { # function for tmp.dietary studies
ifelse(x<tmp[2,3]& x>tmp[2,2],
tmp.coeff_cat[2,1] +
(tmp.coeff_cat[4,1]) *x,NA)}
tmp.plot <- ggplot(dat, aes(cmoderator1, outcome), na.rm = TRUE) +
geom_line(aes(group = study, colour = moderator2, linetype = moderator2), size = 1) +
geom_point(aes(group = study, colour = moderator2), size = 2)
tmp.plot + labs(
x = "Moderator1 (centered values)")+
scale_linetype_manual(
name="Category:",
values=c("solid","solid"),
breaks=c("cat1","cat2"),
labels=c("cat1","cat2")) +
scale_colour_manual(
name="Category:",
values=c("brown","royalblue"),
breaks=c("cat1","cat2"),
labels=c("cat1","cat2")) +
stat_function(fun=f1, colour = "red", size = 2, linetype="solid") +
stat_function(fun=f2, colour = "blue", size = 2, linetype="solid") +
ggtitle("Test")+
theme(
plot.title = element_text(hjust = 0.5, family="serif", size = 25),
axis.title=element_text(family= "serif", colour="black", size=18),
axis.line=element_line(color="black"),
axis.ticks = element_blank(),
axis.text=element_text(family= "serif", colour="black", size=15) ,
legend.title=element_text(family="serif", size=15),
legend.text=element_text(family="serif", size=15),
legend.position="bottom")
More information about the R-sig-meta-analysis
mailing list