[R-meta] Influential case diagnostics in a multivariate multilevel meta-analysis in metafor

Wed Jan 16 15:02:53 CET 2019

Dear Yogev,

Since you use 'cluster=StudyID', cooks.distance() is doing 311 model fits. But you use 'reestimate=FALSE', which should speed things up a lot. Also, 'sparse=TRUE' probably makes a lot of sense here, since the marginal var-cov structure is probably quite sparse. So, for the most part, you are already using features that should help to speed things up.

But a few things:

1) You used 'cluster = StudyID', but unless you used attach(Data) or have 'StudyID' as a separate object in your workspace, this should not work. It should be 'cluster = Data$StudyID'.

2) If you use 'parallel="snow"', then no progress bar will be shown, so I wonder how you got the '6%' then. Or did you run this once without 'parallel="snow"'?

3) If you use 'parallel="snow"', then this won't give you any speed increase unless you actually make use of multiple cores. You can do this with the 'ncpus' argument. But first check how many cores you actually have available with parallel::detectCores() Note that this also counts 'logical' cores. If you are on MacOS or Windows, then detectCores(logical=FALSE) is a better indicator of how many cores to specify under 'ncpus'.

Best,
Wolfgang

>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-
>project.org] On Behalf Of Yogev Kivity
>Sent: Tuesday, 15 January, 2019 21:20
>To: r-sig-meta-analysis using r-project.org
>Subject: [R-meta] Influential case diagnostics in a multivariate
>multilevel meta-analysis in metafor
>
>Hi all,
>
>I am fitting a multivariate multilevel meta-analysis in metafor and
>having
>trouble computing outlier and influential case diagnostics (i.e., cook’s
>distances per
>https://wviechtb.github.io/metafor/reference/influence.rma.mv.html).
>
>This a large dataset of 3360 Pearson’s correlations (converted to
>Fisher’s
>z) nested within 600 subsamples that are nested within 311 studies. Below
>is the code I used for the model and for computing Cook’s distances, and
>the problem is that it takes it a lot of time to run (I ran it overnight
>and it only reached 6%). I am assuming it is related to the size of the
>dataset and to the complex model structure, but I am not sure how to go
>about and speed up the processing. I should note that I am computing the
>distances based on the simplest possible model (i.e., no moderators and
>without considering dependencies among effect sizes within clusters).
>
>I was hoping someone could help with some suggestions of how best to move
>forward.
>
>Thanks,
>Yogev
>
>NoMods <- rma.mv(yi, vi, random = ~ 1 | StudyID/GroupID/EffectSizeID,
>data=Data,sparse=TRUE)
>summary(NoMods)
>NoModsCooksDistance <- cooks.distance(NoMods,progbar = T,cluster =
>StudyID,
>reestimate=FALSE,parallel="snow")
>NoModsCooksDistance
>plot(NoModsCooksDistance, type="o", pch=19)
>
>--
>
>Yogev Kivity, Ph.D.
>Postdoctoral Fellow
>Department of Psychology
>The Pennsylvania State University
>Bruce V. Moore Building
>University Park, PA 16802
>Office Phone: (814) 867-2330