[R-meta] Multilevel with dataset where lines are treatment means

Martineau, Roger Roger.Martineau at AGR.GC.CA
Wed Sep 27 19:17:49 CEST 2017

Dear Wolfgang,

Thanks for your reply. I agree with you regarding question #2 but I would like to add more information to the simulated data set regarding the reply to question #1.

I change the name of some variables:

dat <- data.frame(
  type.of.studies=c(rep("casein studies",times=12),
         rep("feeding trials",times=16))
dat$ID <- 1:nrow(dat)             # adding row numbers
dat <- dat[,c(7,6,1:5)]           # ordering columns

Data were aggregated from 2 independent sets of studies (i.e. type of studies = casein studies or feeding trials). Each line (n = 28) includes a treatment mean (milk) and its SEM. Treatment means for milk yield were collected from 5 publications (i.e., A, B, C, D, E), each publication reporting 1 or more experiments (e.g., A1 and A2). Each experiment includes at least 2 treatment means.

I used a multilevel model because experiments are nested within publications. For this meta-analysis, the effect sizes are raw means and vi = SEM^2. The moderator is "intake". This simulated data set contains only 2 or 3 publications for each type of studies but there is >30 publications for each type of studies in the real data set.

The question of interest is: Does the relationship between milk yield and intake differ between casein studies and feeding trials ?

1st question: Is it possible to impute a block-diagonal covariance matrix with the "impute_covariance_matrix" function (in a data set with raw means) ???

2nd question: Using 'random = ~1|publication/experiment/line' would be OK if "line" contained an effect size, e.g. mean difference. However, I am not sure it is OK if "line" contains a raw mean ???
3rd question: Using 'random = ~factor(type.of.studies )|publication/experiment' in the model would allow the amount of residual heterogeneity to be different in each subset of studies (i.e., casein studies or feeding trials). Is it OK ???

4th question: After centering "intake", I would test the next model and use the robust function because I didn't impute a block-diagonal covariance matrix. Is it OK ???

dat <- within(dat, {
  cintake   <- intake - mean(intake,na.rm = TRUE)

res <- rma.mv(milk ~ cintake*factor(type.of.studies), SEM^2, 
              random = ~ factor(type.of.studies)|publication/experiment, data=dat, digits=4)

Thanks in advance,

Roger ☺

-----Message d'origine-----
De : Viechtbauer Wolfgang (SP) [mailto:wolfgang.viechtbauer at maastrichtuniversity.nl] 
Envoyé : 23 septembre 2017 17:42
À : Martineau, Roger; r-sig-meta-analysis at r-project.org
Objet : RE: Multilevel with dataset where lines are treatment means

Regarding question 1:

Generally speaking, you want to use 'random = ~1|experiment/study/line'. The standard random effects model includes random effects for each estimate/row. Multilevel models are an extension of this, where we add random effects for higher level clustering levels. Those are *added* to the model -- they don't replace the random effects per estimate/row. See also:


Using 'random = ~1|experiment/study' would assume that true outcomes are homogeneous within each level of 'study within experiment'. That may not be the case, so I would always use (or at least start with) 'random = ~1|experiment/study/line'.

Regarding question 2:

I don't think there is any general consensus on how to deal with outliers/influential estimates/clusters, especially in more complex models. But I think most people would agree that we should not just delete them (unless they are clearly data entry/coding errors). 

And yes, you can use 'cluster=dat$study' (or any other higher level clustering variables, such as dat$experiment) in rstudent() and cooks.distance() for 'rma.mv' objects. Again, there isn't one best strategy here. A particular estimate could be an outlier/influential, an entire study could be, or even an whole experiment. You might as well examine all of those possibilities.


-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-project.org] On Behalf Of Martineau, Roger
Sent: Thursday, 14 September, 2017 23:08
To: r-sig-meta-analysis at r-project.org
Subject: [R-meta] Multilevel with dataset where lines are treatment means

Hi all,

I have a large data set to analyze and I want to get my model (and modus operandi) right before publication. Therefore I will provide a fictive example much smaller (n=28) but having a similar structure (i.e., experiment/study).

Thanks in advance,

Roger ☺

The fictive data are:

dat <- data.frame(
                       "cat2","cat2","cat2","cat2")) dat

In this data set, "experiment" represents different publications (n=5) and one can fin 1 or 2 studies in each publication. Each study reports at least 2 treatment means (i.e., outcome) and up to 4 treatment means per study (see E2). In animal science, milk yield is an example of such an oucome. Indeed SEM of outcome is identical per study.

I included a continuous (moderator1) and a categorical moderator (moderator2). I deliberately set data such as there is an interaction (the outcome increase with increasing values of moderator1 for cat1 and vice-versa for cat2).

I have a couple of questions on the proper way to analyse such a data set reporting treatment means (and associated SEM).

**** Question 1 concerns the correct random argument structure to use:

# centering moderator:
dat <- within(dat, {
  cmoderator1   <- moderator1 - mean(moderator1,na.rm = TRUE)

# model with 2 options for random argument:
res <- rma.mv(outcome,SEM^2, data=dat,
              mods = ~
              random = ~1|experiment/study, # option 1
              #random = ~1|experiment/study/line, # option 2
              method = "REML")

Results with option 1 and 2 are close but which one is correct ? and why ?

Based on previous posts, I assume that SE and P-values are correctly estimated using the robust function and the upper level cluster (i.e. experiment):

robust(res, cluster=dat$experiment)

In my large data set, I look for outliers and influential cases with studentitized residuals and Cook's distance values.

**** My second question is: is it OK to identify (and remove from the data set) outliers and influential cases on a study basis (i.e., using the cluster argument in the functions) ? or is it better to identify idividual treatment means (by removing the cluster argument) ?

#### .. Student residuals ####
tmp.stud.res <- rstudent(res,
                         cluster=dat$study, # is this OK ?

#### .. Cook's distance ####
tmp.cook <- cooks.distance(res,
                           cluster=dat$study,  # is this OK ?
                           progbar=TRUE) plot(tmp.cook, type="o", pch=19) which(tmp.cook > xxx)

Indeed, I do the other functions:

### Likelihood profile plots ###

### Normal Q-Q Plot
qqnorm(rstandard(res)$z, pch=19)

### Funnel plots (publication bias) ###
funnel(res, main="Standard Error")
funnel(res, yaxis="vi", main="Sampling Variance") funnel(res, yaxis="seinv", main="Inverse Standard Error") funnel(res, yaxis="vinv", main="Inverse Sampling Variance")

This is the graph which shows the interaction:

#### .. Graph ####
(res <- rma.mv(outcome, SEM^2, data=dat,
                  mods = ~
                    factor(moderator2)-1 +
                  random = ~1|experiment/study,
                  method = "REML"))

tmp.coeff_cat <- round(coef(summary(res)), 4) tmp <- ddply(dat, .(moderator2), summarise,
             min = min(cmoderator1, na.rm = TRUE),
             max = max(cmoderator1, na.rm = TRUE))

f1 <- function(x)  { #  function for tmp.casein studies
  ifelse(x<tmp[1,3]& x>tmp[1,2],
         tmp.coeff_cat[1,1] +
           (tmp.coeff_cat[3,1]) *x,NA)}
f2 <- function(x)  { #  function for tmp.dietary studies
  ifelse(x<tmp[2,3]& x>tmp[2,2],
         tmp.coeff_cat[2,1] +
           (tmp.coeff_cat[4,1]) *x,NA)}

tmp.plot       <- ggplot(dat, aes(cmoderator1, outcome), na.rm = TRUE) +
  geom_line(aes(group = study, colour = moderator2, linetype = moderator2), size = 1) +
  geom_point(aes(group = study, colour = moderator2), size = 2)

tmp.plot + labs(
  x = "Moderator1 (centered values)")+
    labels=c("cat1","cat2")) +
    labels=c("cat1","cat2")) +
  stat_function(fun=f1,  colour = "red", size = 2, linetype="solid") +
  stat_function(fun=f2,  colour = "blue", size = 2, linetype="solid") +
    plot.title = element_text(hjust = 0.5, family="serif", size = 25),
    axis.title=element_text(family= "serif", colour="black", size=18),
    axis.ticks = element_blank(),
    axis.text=element_text(family= "serif", colour="black", size=15) ,
    legend.title=element_text(family="serif", size=15),
    legend.text=element_text(family="serif", size=15),

More information about the R-sig-meta-analysis mailing list