[R-meta] possible miscalculation of Cook’s distances

Fri Feb 4 07:19:29 CET 2022

Dear Reza,

Thank you very much for your reply. I do understand that not all outliers
have to be influential and that not all influential cases have to be
outliers. But thank you for mentioning this and for the code provided. I
did notice you set the reestimate = FALSE. This raises two questions,
actually. It would be great if you help with them.

1) Is there any explanation why cooks.distance.rma.mv(res.ml) and
cooks.distance.rma.mv(res.ml , reestimate = FALSE) give completely
different results about the experiments 1 and 2 (and very similar for the
remaining experiments)?

According to cooks.distance.rma.mv(res.ml), the Study 1, Experiment 2 is
influential, but
according to  cooks.distance.rma.mv(res.ml , reestimate = FALSE), Study 1,
Experiment 1 is influential

#the code illustrating my question

dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
                  yi=c( 68, 18, 31,20,10,26),
                  vi=c(60,32, 15, 19, 41, 82))

res.ml <- rma.mv(yi, vi,
                 random = ~ 1 | study/experiment,
                 data=dat,
                 slab = paste("Study ", study,", ", "Experiment ",
experiment, sep = ""))

cook_with_reest <- cooks.distance.rma.mv(res.ml)
cook_without_reest <- cooks.distance.rma.mv(res.ml, reestimate = FALSE)

plot(1:length(cook_with_reest), cook_with_reest, ylim=c(0,2), type="o",
pch=19, las=1, xaxt="n",
     xlab="Study and Experiment ID", ylab="Cook's Distance")
points(cook_without_reest, type="o", pch=19, col = "red")
axis(1, 1:length(cook_with_reest), labels=names(cook_with_reest))
abline(h=4/length(cook_with_reest), lty="dotted", lwd=2)
legend("top", pch=19, col=c("black","red"), lty="solid",
       legend=c("reestimate = TRUE","reestimate = FALSE"), bty="n")

2) And could you please explain what is the rationale to set reestimate =
FALSE? According to the metafor documentation:
"Doing so only yields an approximation to the Cook’s distances that ignores
the influence of the ith case on the variance/correlation components"

Sincerely,
Antonina

On Fri, Feb 4, 2022 at 3:29 AM Reza Norouzian <rnorouzian using gmail.com> wrote:

> Dear Antonina,
>
> An effect estimate whose standardized deleted residual falls beyond
> +/-1.96 doesn't necessarily need to have a cook's distance
> (influence), and/or a hat value (if additional moderators are used)
> that are likewise extreme.
>
> As a general proposition, it may be methodologically more reasonable
> to simultaneously consult all these indices to flag an extreme effect
> estimate.
>
> That said, in your case, it does seem that the estimate from study 1,
> experiment 1 has a large standardized deleted residual as well as a
> large Cook's distance relative to that of other effect estimates. So,
> I don't think there is any bug in any of the metafor functions you
> used.
>
> Below is a bit of code to catch that outlying effect estimate.
>
> Kind regards,
> Reza
>
> dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
>                   yi=c( 68, 18, 31,20,10,26),
>                   vi=c(60,32, 15, 19, 41, 82))
>
> res.ml <- rma.mv(yi, vi,
>                  random = ~ 1 | study/experiment,
>                  data=dat,
>                  slab = paste("Study ", study,", ", "Experiment ",
>                               experiment, sep = ""))
>
> dat <- transform(dat, obs = res.ml$slab)
>
> outlier_limit <- qnorm(.975)
>
> cook <- cooks.distance.rma.mv(res.ml,
>                               reestimate = FALSE)
>
> st_del_res_z <- rstudent.rma.mv(res.ml,
>                                 reestimate = FALSE)$z
>
> cook_limit <- max(mean(range(cook)), boxplot.stats(cook, coef =
> 1.5)$stats[5])
>
> i <- abs(st_del_res_z) > outlier_limit
> k <- cook > cook_limit
> oo <- which(i & k)
>
> LL <- names(oo)
>
> removed <- subset(dat, obs %in% LL)
> new_dat <- subset(dat, !obs %in% LL)
>
>
>
>
>
>
>
> On Thu, Feb 3, 2022 at 3:36 PM Antonina Dolgorukova
> <an.dolgorukova using gmail.com> wrote:
> >
> > Dear Dr. Viechtbauer and all,
> >
> > I have a multilevel data structure (experiments nested within studies)
> are
> > use rma.mv to calculate an overall effect estimate. The next step
> requires
> > sensitivity analysis on the experiment-level data. For detecting outliers
> > I've used standardized (deleted) residuals and for detecting influential
> > experiments I've used Cook’s distance. However, the last test provides
> > contradictory results.
> >
> > The forest plot indicates that the 1st experiment may be an outlier, and
> > standardized (deleted) residuals confirm this. But according to Cook’s
> > distance plot, the 2nd experiment is influential. It seems that there may
> > be a miscalculation of Cook’s distance since I can easily reproduce this
> > issue (in one study one of the experiments have to provide a much larger
> ES
> > than the other) also if use the model with random = ~ 1 | experiment, the
> > 1st experiment is influential, not the second.
> >
> > Could you please clarify is this a bug or a feature of the
> cooks.distance()
> > function? Maybe it does not work properly with rma.mv objects?
> >
> >   ## Reproducible Example
> >
> > # data frame
> >   dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
> >                     yi=c( 68, 18, 31,20,10,26),
> >                     vi=c(60,32, 15, 19, 41, 82))
> >
> > # multilevel model
> > res.ml <- rma.mv(yi, vi,
> >                  random = ~ 1 | study/experiment,
> >                  data=dat,
> >                  slab = paste("Study ", study,", ", "Experiment ",
> > experiment, sep = ""))
> >
> > # forest plot examination  indicates that the 1st experiment may be an
> > outlier
> > forest(res.ml,
> >        header = "Study and Experiment ID")
> >
> > # standardized (deleted) residuals confirm this
> > rst <- rstudent(res.ml)
> > plot(NA, NA, xlim=c(1, res.ml$k), ylim=c(-3,5),
> >      xlab="Study and Experiment ID", ylab="Standardized (Deleted)
> Residual",
> >      xaxt="n", las=1)
> > axis(side=1, at=1:res.ml$k, labels=rst$slab)
> > abline(h=c(-1.96,1.96), lty="dotted")
> > abline(h=0)
> > points(1:res.ml$k, rst$z, type="o", pch=19)
> >
> > # according to Cook’s distance plot, the 2nd experiment is influential
> > cooksd <- cooks.distance(res.ml)
> > plot(1:length(cooksd), cooksd, ylim=c(0,2), type="o", pch=19, las=1,
> > xaxt="n",
> >      xlab="Experiment ID", ylab="Cook's Distance")
> > axis(1, 1:length(cooksd), labels=names(cooksd))
> > abline(h=4/length(cooksd), lty="dotted", lwd=2)
> >
> > sessionInfo()
> > R version 4.1.2 (2021-11-01)
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
> > Running under: Windows 10 x64 (build 19042)
> > [1] metafor_3.1-43 metadat_1.0-0  Matrix_1.3-4
> >
> > --
> > Antonina Dolgorukova, M.D.
> > Pavlov First Saint Petersburg State Medical University
> > Lev Tolstoy str., 6-8, 197022
> > Saint Petersburg, Russia
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-meta-analysis mailing list
> > R-sig-meta-analysis using r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>

-- 
Antonina Dolgorukova, M.D.
Pavlov First Saint Petersburg State Medical University
Lev Tolstoy str., 6-8, 197022
Saint Petersburg, Russia

	[[alternative HTML version deleted]]