[R-meta] possible miscalculation of Cook’s distances

Reza Norouzian rnorouz|@n @end|ng |rom gm@||@com
Fri Feb 4 00:29:36 CET 2022


Dear Antonina,

An effect estimate whose standardized deleted residual falls beyond
+/-1.96 doesn't necessarily need to have a cook's distance
(influence), and/or a hat value (if additional moderators are used)
that are likewise extreme.

As a general proposition, it may be methodologically more reasonable
to simultaneously consult all these indices to flag an extreme effect
estimate.

That said, in your case, it does seem that the estimate from study 1,
experiment 1 has a large standardized deleted residual as well as a
large Cook's distance relative to that of other effect estimates. So,
I don't think there is any bug in any of the metafor functions you
used.

Below is a bit of code to catch that outlying effect estimate.

Kind regards,
Reza

dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
                  yi=c( 68, 18, 31,20,10,26),
                  vi=c(60,32, 15, 19, 41, 82))

res.ml <- rma.mv(yi, vi,
                 random = ~ 1 | study/experiment,
                 data=dat,
                 slab = paste("Study ", study,", ", "Experiment ",
                              experiment, sep = ""))

dat <- transform(dat, obs = res.ml$slab)

outlier_limit <- qnorm(.975)

cook <- cooks.distance.rma.mv(res.ml,
                              reestimate = FALSE)

st_del_res_z <- rstudent.rma.mv(res.ml,
                                reestimate = FALSE)$z

cook_limit <- max(mean(range(cook)), boxplot.stats(cook, coef = 1.5)$stats[5])

i <- abs(st_del_res_z) > outlier_limit
k <- cook > cook_limit
oo <- which(i & k)

LL <- names(oo)

removed <- subset(dat, obs %in% LL)
new_dat <- subset(dat, !obs %in% LL)







On Thu, Feb 3, 2022 at 3:36 PM Antonina Dolgorukova
<an.dolgorukova using gmail.com> wrote:
>
> Dear Dr. Viechtbauer and all,
>
> I have a multilevel data structure (experiments nested within studies) are
> use rma.mv to calculate an overall effect estimate. The next step requires
> sensitivity analysis on the experiment-level data. For detecting outliers
> I've used standardized (deleted) residuals and for detecting influential
> experiments I've used Cook’s distance. However, the last test provides
> contradictory results.
>
> The forest plot indicates that the 1st experiment may be an outlier, and
> standardized (deleted) residuals confirm this. But according to Cook’s
> distance plot, the 2nd experiment is influential. It seems that there may
> be a miscalculation of Cook’s distance since I can easily reproduce this
> issue (in one study one of the experiments have to provide a much larger ES
> than the other) also if use the model with random = ~ 1 | experiment, the
> 1st experiment is influential, not the second.
>
> Could you please clarify is this a bug or a feature of the cooks.distance()
> function? Maybe it does not work properly with rma.mv objects?
>
>   ## Reproducible Example
>
> # data frame
>   dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
>                     yi=c( 68, 18, 31,20,10,26),
>                     vi=c(60,32, 15, 19, 41, 82))
>
> # multilevel model
> res.ml <- rma.mv(yi, vi,
>                  random = ~ 1 | study/experiment,
>                  data=dat,
>                  slab = paste("Study ", study,", ", "Experiment ",
> experiment, sep = ""))
>
> # forest plot examination  indicates that the 1st experiment may be an
> outlier
> forest(res.ml,
>        header = "Study and Experiment ID")
>
> # standardized (deleted) residuals confirm this
> rst <- rstudent(res.ml)
> plot(NA, NA, xlim=c(1, res.ml$k), ylim=c(-3,5),
>      xlab="Study and Experiment ID", ylab="Standardized (Deleted) Residual",
>      xaxt="n", las=1)
> axis(side=1, at=1:res.ml$k, labels=rst$slab)
> abline(h=c(-1.96,1.96), lty="dotted")
> abline(h=0)
> points(1:res.ml$k, rst$z, type="o", pch=19)
>
> # according to Cook’s distance plot, the 2nd experiment is influential
> cooksd <- cooks.distance(res.ml)
> plot(1:length(cooksd), cooksd, ylim=c(0,2), type="o", pch=19, las=1,
> xaxt="n",
>      xlab="Experiment ID", ylab="Cook's Distance")
> axis(1, 1:length(cooksd), labels=names(cooksd))
> abline(h=4/length(cooksd), lty="dotted", lwd=2)
>
> sessionInfo()
> R version 4.1.2 (2021-11-01)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 19042)
> [1] metafor_3.1-43 metadat_1.0-0  Matrix_1.3-4
>
> --
> Antonina Dolgorukova, M.D.
> Pavlov First Saint Petersburg State Medical University
> Lev Tolstoy str., 6-8, 197022
> Saint Petersburg, Russia
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-meta-analysis mailing list
> R-sig-meta-analysis using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis



More information about the R-sig-meta-analysis mailing list