[R-meta] possible miscalculation of Cook’s distances

Fri Feb 4 08:01:21 CET 2022

Dear Antonina,

There was no rationale, I just wanted to indicate that there is no bug
in the function (and save myself a tiny bit of time). The default
reestimate = TRUE is certainly to be preferred given that your model
does include random-effects and, as the documentation correctly
mentions, the influence of each effect estimate on the estimates of
heterogeneity and correlation can only be examined, if you set
reestimate = TRUE.

For instance, if you remove the random-effects from your initial model
(res.ml), then, you'll see that the use of reestimate = TRUE or FALSE
has no effect. In both cases, only the effect estimate associated with
study 1, experiment 1 is influential. Thus, this shows that the use of
reestimate = FALSE in your initial model essentially ignored the
influence of each effect estimate on the estimates of heterogeneity.

In models with complex random-effects structure esp. fit to large
datasets, setting reestimate = TRUE would take a good chunk of time.
In such cases, some (myself included) may be tempted to set reestimate
= FALSE hoping that the approximation will be close enough.

Kind regards,
Reza

res.ml2 <- update.rma(res.ml, random = NULL)

cook_with_reest2 <- cooks.distance.rma.mv(res.ml2)
cook_without_reest2 <- cooks.distance.rma.mv(res.ml2, reestimate = FALSE)

plot(cook_with_reest2,type="b")
plot(cook_without_reest2,type="b")

On Fri, Feb 4, 2022 at 12:19 AM Antonina Dolgorukova
<an.dolgorukova using gmail.com> wrote:
>
> Dear Reza,
>
> Thank you very much for your reply. I do understand that not all outliers have to be influential and that not all influential cases have to be outliers. But thank you for mentioning this and for the code provided. I did notice you set the reestimate = FALSE. This raises two questions, actually. It would be great if you help with them.
>
> 1) Is there any explanation why cooks.distance.rma.mv(res.ml) and cooks.distance.rma.mv(res.ml , reestimate = FALSE) give completely different results about the experiments 1 and 2 (and very similar for the remaining experiments)?
>
> According to cooks.distance.rma.mv(res.ml), the Study 1, Experiment 2 is influential, but
> according to  cooks.distance.rma.mv(res.ml , reestimate = FALSE), Study 1, Experiment 1 is influential
>
>
> #the code illustrating my question
>
> dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
>                   yi=c( 68, 18, 31,20,10,26),
>                   vi=c(60,32, 15, 19, 41, 82))
>
> res.ml <- rma.mv(yi, vi,
>                  random = ~ 1 | study/experiment,
>                  data=dat,
>                  slab = paste("Study ", study,", ", "Experiment ", experiment, sep = ""))
>
> cook_with_reest <- cooks.distance.rma.mv(res.ml)
> cook_without_reest <- cooks.distance.rma.mv(res.ml, reestimate = FALSE)
>
> plot(1:length(cook_with_reest), cook_with_reest, ylim=c(0,2), type="o", pch=19, las=1, xaxt="n",
>      xlab="Study and Experiment ID", ylab="Cook's Distance")
> points(cook_without_reest, type="o", pch=19, col = "red")
> axis(1, 1:length(cook_with_reest), labels=names(cook_with_reest))
> abline(h=4/length(cook_with_reest), lty="dotted", lwd=2)
> legend("top", pch=19, col=c("black","red"), lty="solid",
>        legend=c("reestimate = TRUE","reestimate = FALSE"), bty="n")
>
>
>
> 2) And could you please explain what is the rationale to set reestimate = FALSE? According to the metafor documentation:
> "Doing so only yields an approximation to the Cook’s distances that ignores the influence of the ith case on the variance/correlation components"
>
>
> Sincerely,
> Antonina
>
> On Fri, Feb 4, 2022 at 3:29 AM Reza Norouzian <rnorouzian using gmail.com> wrote:
>>
>> Dear Antonina,
>>
>> An effect estimate whose standardized deleted residual falls beyond
>> +/-1.96 doesn't necessarily need to have a cook's distance
>> (influence), and/or a hat value (if additional moderators are used)
>> that are likewise extreme.
>>
>> As a general proposition, it may be methodologically more reasonable
>> to simultaneously consult all these indices to flag an extreme effect
>> estimate.
>>
>> That said, in your case, it does seem that the estimate from study 1,
>> experiment 1 has a large standardized deleted residual as well as a
>> large Cook's distance relative to that of other effect estimates. So,
>> I don't think there is any bug in any of the metafor functions you
>> used.
>>
>> Below is a bit of code to catch that outlying effect estimate.
>>
>> Kind regards,
>> Reza
>>
>> dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
>>                   yi=c( 68, 18, 31,20,10,26),
>>                   vi=c(60,32, 15, 19, 41, 82))
>>
>> res.ml <- rma.mv(yi, vi,
>>                  random = ~ 1 | study/experiment,
>>                  data=dat,
>>                  slab = paste("Study ", study,", ", "Experiment ",
>>                               experiment, sep = ""))
>>
>> dat <- transform(dat, obs = res.ml$slab)
>>
>> outlier_limit <- qnorm(.975)
>>
>> cook <- cooks.distance.rma.mv(res.ml,
>>                               reestimate = FALSE)
>>
>> st_del_res_z <- rstudent.rma.mv(res.ml,
>>                                 reestimate = FALSE)$z
>>
>> cook_limit <- max(mean(range(cook)), boxplot.stats(cook, coef = 1.5)$stats[5])
>>
>> i <- abs(st_del_res_z) > outlier_limit
>> k <- cook > cook_limit
>> oo <- which(i & k)
>>
>> LL <- names(oo)
>>
>> removed <- subset(dat, obs %in% LL)
>> new_dat <- subset(dat, !obs %in% LL)
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Feb 3, 2022 at 3:36 PM Antonina Dolgorukova
>> <an.dolgorukova using gmail.com> wrote:
>> >
>> > Dear Dr. Viechtbauer and all,
>> >
>> > I have a multilevel data structure (experiments nested within studies) are
>> > use rma.mv to calculate an overall effect estimate. The next step requires
>> > sensitivity analysis on the experiment-level data. For detecting outliers
>> > I've used standardized (deleted) residuals and for detecting influential
>> > experiments I've used Cook’s distance. However, the last test provides
>> > contradictory results.
>> >
>> > The forest plot indicates that the 1st experiment may be an outlier, and
>> > standardized (deleted) residuals confirm this. But according to Cook’s
>> > distance plot, the 2nd experiment is influential. It seems that there may
>> > be a miscalculation of Cook’s distance since I can easily reproduce this
>> > issue (in one study one of the experiments have to provide a much larger ES
>> > than the other) also if use the model with random = ~ 1 | experiment, the
>> > 1st experiment is influential, not the second.
>> >
>> > Could you please clarify is this a bug or a feature of the cooks.distance()
>> > function? Maybe it does not work properly with rma.mv objects?
>> >
>> >   ## Reproducible Example
>> >
>> > # data frame
>> >   dat <- data.frame(study=c(1,1,2,3,3,3), experiment=c(1:6),
>> >                     yi=c( 68, 18, 31,20,10,26),
>> >                     vi=c(60,32, 15, 19, 41, 82))
>> >
>> > # multilevel model
>> > res.ml <- rma.mv(yi, vi,
>> >                  random = ~ 1 | study/experiment,
>> >                  data=dat,
>> >                  slab = paste("Study ", study,", ", "Experiment ",
>> > experiment, sep = ""))
>> >
>> > # forest plot examination  indicates that the 1st experiment may be an
>> > outlier
>> > forest(res.ml,
>> >        header = "Study and Experiment ID")
>> >
>> > # standardized (deleted) residuals confirm this
>> > rst <- rstudent(res.ml)
>> > plot(NA, NA, xlim=c(1, res.ml$k), ylim=c(-3,5),
>> >      xlab="Study and Experiment ID", ylab="Standardized (Deleted) Residual",
>> >      xaxt="n", las=1)
>> > axis(side=1, at=1:res.ml$k, labels=rst$slab)
>> > abline(h=c(-1.96,1.96), lty="dotted")
>> > abline(h=0)
>> > points(1:res.ml$k, rst$z, type="o", pch=19)
>> >
>> > # according to Cook’s distance plot, the 2nd experiment is influential
>> > cooksd <- cooks.distance(res.ml)
>> > plot(1:length(cooksd), cooksd, ylim=c(0,2), type="o", pch=19, las=1,
>> > xaxt="n",
>> >      xlab="Experiment ID", ylab="Cook's Distance")
>> > axis(1, 1:length(cooksd), labels=names(cooksd))
>> > abline(h=4/length(cooksd), lty="dotted", lwd=2)
>> >
>> > sessionInfo()
>> > R version 4.1.2 (2021-11-01)
>> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> > Running under: Windows 10 x64 (build 19042)
>> > [1] metafor_3.1-43 metadat_1.0-0  Matrix_1.3-4
>> >
>> > --
>> > Antonina Dolgorukova, M.D.
>> > Pavlov First Saint Petersburg State Medical University
>> > Lev Tolstoy str., 6-8, 197022
>> > Saint Petersburg, Russia
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > R-sig-meta-analysis mailing list
>> > R-sig-meta-analysis using r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>
>
>
> --
> Antonina Dolgorukova, M.D.
> Pavlov First Saint Petersburg State Medical University
> Lev Tolstoy str., 6-8, 197022
> Saint Petersburg, Russia