[R-meta] Multilevel with dataset where lines are treatment means
Viechtbauer Wolfgang (SP)
wolfgang.viechtbauer at maastrichtuniversity.nl
Sat Sep 23 23:42:26 CEST 2017
Regarding question 1:
Generally speaking, you want to use 'random = ~1|experiment/study/line'. The standard random effects model includes random effects for each estimate/row. Multilevel models are an extension of this, where we add random effects for higher level clustering levels. Those are *added* to the model -- they don't replace the random effects per estimate/row. See also:
http://www.metafor-project.org/doku.php/analyses:konstantopoulos2011#a_common_mistake_in_the_three-level_model
Using 'random = ~1|experiment/study' would assume that true outcomes are homogeneous within each level of 'study within experiment'. That may not be the case, so I would always use (or at least start with) 'random = ~1|experiment/study/line'.
Regarding question 2:
I don't think there is any general consensus on how to deal with outliers/influential estimates/clusters, especially in more complex models. But I think most people would agree that we should not just delete them (unless they are clearly data entry/coding errors).
And yes, you can use 'cluster=dat$study' (or any other higher level clustering variables, such as dat$experiment) in rstudent() and cooks.distance() for 'rma.mv' objects. Again, there isn't one best strategy here. A particular estimate could be an outlier/influential, an entire study could be, or even an whole experiment. You might as well examine all of those possibilities.
Best,
Wolfgang
-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Martineau, Roger
Sent: Thursday, 14 September, 2017 23:08
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Multilevel with dataset where lines are treatment means
Hi all,
I have a large data set to analyze and I want to get my model (and modus operandi) right before publication. Therefore I will provide a fictive example much smaller (n=28) but having a similar structure (i.e., experiment/study).
Thanks in advance,
Roger ☺
The fictive data are:
dat <- data.frame(
line=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28),
experiment=c("A","A","A","A","B","B","B","B","C","C","C","C",
"D","D","D","D","D","D","D","D","E","E","E","E","E","E","E","E"),
study=c("A1","A1","A2","A2","B1","B1","B2","B2","C1","C1","C1","C1",
"D1","D1","D1","D1","D2","D2","D2","D2","E1","E1","E1","E1","E2","E2","E2","E2"),
outcome=c(10,18,5,25,13,48,7,25,6,13,20,35,65,45,30,15,55,45,30,10,45,40,35,30,35,26,18,8),
SEM=c(5,5,6,6,7,7,3,3,7,7,7,7,6,6,6,6,5,5,5,5,6,6,6,6,5,5,5,5),
moderator1=c(5,10,25,45,8,30,5,35,5,10,15,20,10,20,25,30,20,30,40,50,10,15,20,25,5,6,9,12),
moderator2=c("cat1","cat1","cat1","cat1","cat1","cat1","cat1","cat1",
"cat1","cat1","cat1","cat1","cat2","cat2","cat2","cat2",
"cat2","cat2","cat2","cat2","cat2","cat2","cat2","cat2",
"cat2","cat2","cat2","cat2"))
dat
str(dat)
In this data set, "experiment" represents different publications (n=5) and one can fin 1 or 2 studies in each publication. Each study reports at least 2 treatment means (i.e., outcome) and up to 4 treatment means per study (see E2). In animal science, milk yield is an example of such an oucome. Indeed SEM of outcome is identical per study.
I included a continuous (moderator1) and a categorical moderator (moderator2). I deliberately set data such as there is an interaction (the outcome increase with increasing values of moderator1 for cat1 and vice-versa for cat2).
I have a couple of questions on the proper way to analyse such a data set reporting treatment means (and associated SEM).
**** Question 1 concerns the correct random argument structure to use:
# centering moderator:
dat <- within(dat, {
cmoderator1 <- moderator1 - mean(moderator1,na.rm = TRUE)
})
# model with 2 options for random argument:
res <- rma.mv(outcome,SEM^2, data=dat,
mods = ~
cmoderator1*factor(moderator2),
random = ~1|experiment/study, # option 1
#random = ~1|experiment/study/line, # option 2
method = "REML")
summary(res)
Results with option 1 and 2 are close but which one is correct ? and why ?
Based on previous posts, I assume that SE and P-values are correctly estimated using the robust function and the upper level cluster (i.e. experiment):
robust(res, cluster=dat$experiment)
In my large data set, I look for outliers and influential cases with studentitized residuals and Cook's distance values.
**** My second question is: is it OK to identify (and remove from the data set) outliers and influential cases on a study basis (i.e., using the cluster argument in the functions) ? or is it better to identify idividual treatment means (by removing the cluster argument) ?
#### .. Student residuals ####
tmp.stud.res <- rstudent(res,
cluster=dat$study, # is this OK ?
progbar=TRUE)
order(-tmp.stud.res$cluster$X2)
tmp.stud.res$cluster[c(xxx)]
#### .. Cook's distance ####
par(mfrow=c(1,1))
tmp.cook <- cooks.distance(res,
cluster=dat$study, # is this OK ?
progbar=TRUE)
plot(tmp.cook, type="o", pch=19)
which(tmp.cook > xxx)
Indeed, I do the other functions:
### Likelihood profile plots ###
par(mfrow=c(2,1))
profile.rma.mv(res)
### Normal Q-Q Plot
par(mfrow=c(1,1))
qqnorm(rstandard(res)$z, pch=19)
qqline(rstandard(res)$z)
### Funnel plots (publication bias) ###
par(mfrow=c(2,2))
funnel(res, main="Standard Error")
funnel(res, yaxis="vi", main="Sampling Variance")
funnel(res, yaxis="seinv", main="Inverse Standard Error")
funnel(res, yaxis="vinv", main="Inverse Sampling Variance")
This is the graph which shows the interaction:
#### .. Graph ####
(res <- rma.mv(outcome, SEM^2, data=dat,
mods = ~
factor(moderator2)-1 +
factor(moderator2):cmoderator1,
random = ~1|experiment/study,
method = "REML"))
tmp.coeff_cat <- round(coef(summary(res)), 4)
tmp <- ddply(dat, .(moderator2), summarise,
min = min(cmoderator1, na.rm = TRUE),
max = max(cmoderator1, na.rm = TRUE))
f1 <- function(x) { # function for tmp.casein studies
ifelse(x<tmp[1,3]& x>tmp[1,2],
tmp.coeff_cat[1,1] +
(tmp.coeff_cat[3,1]) *x,NA)}
f2 <- function(x) { # function for tmp.dietary studies
ifelse(x<tmp[2,3]& x>tmp[2,2],
tmp.coeff_cat[2,1] +
(tmp.coeff_cat[4,1]) *x,NA)}
tmp.plot <- ggplot(dat, aes(cmoderator1, outcome), na.rm = TRUE) +
geom_line(aes(group = study, colour = moderator2, linetype = moderator2), size = 1) +
geom_point(aes(group = study, colour = moderator2), size = 2)
tmp.plot + labs(
x = "Moderator1 (centered values)")+
scale_linetype_manual(
name="Category:",
values=c("solid","solid"),
breaks=c("cat1","cat2"),
labels=c("cat1","cat2")) +
scale_colour_manual(
name="Category:",
values=c("brown","royalblue"),
breaks=c("cat1","cat2"),
labels=c("cat1","cat2")) +
stat_function(fun=f1, colour = "red", size = 2, linetype="solid") +
stat_function(fun=f2, colour = "blue", size = 2, linetype="solid") +
ggtitle("Test")+
theme(
plot.title = element_text(hjust = 0.5, family="serif", size = 25),
axis.title=element_text(family= "serif", colour="black", size=18),
axis.line=element_line(color="black"),
axis.ticks = element_blank(),
axis.text=element_text(family= "serif", colour="black", size=15) ,
legend.title=element_text(family="serif", size=15),
legend.text=element_text(family="serif", size=15),
legend.position="bottom")
More information about the R-sig-meta-analysis
mailing list