[R-meta] cooks.distance.rma.mv is slow on complex models

Wed Aug 16 18:51:35 CEST 2017

Good to hear that.

By the way, there is now also rstudent() for 'rma.mv' objects (also with a 'cluster' argument, option for parallel processing, and 'reestimate' argument). Also, rstandard() now has a cluster argument. When using the cluster argument with rstandard() and rstudent(), the functions also compute cluster-level multivariate (internally or externally) standardized residuals. So, all of the tools are there for proper outlier diagnostics in 'rma.mv' models (i.e., one can check for outlying estimates and clusters).

For checking for influential estimates/clusters, there is cooks.distance() and I also just added dfbetas() for 'rma.mv' models (but I usually find Cook's distances sufficient). I will add an influence.rma.mv() function soon that will also provide covariance ratios (those can be a useful addition to Cook's distances).

Best,
Wolfgang

-----Original Message-----
From: Martineau, Roger [mailto:Roger.Martineau at AGR.GC.CA] 
Sent: Wednesday, August 16, 2017 18:38
To: Viechtbauer Wolfgang (SP)
Cc: r-sig-meta-analysis at r-project.org
Subject: TR: cooks.distance.rma.mv is slow on complex models

Dear Wolfgang,

Thanks a lot for the follow-up and the improvement to cooks.distance.rma.mv().

I just re-computed Cooks distance values on the 4-level model and the computer solved the function in 1 min 11 sec instead of 14 min 5 sec as done previously. I can certainly live with that. It is a great improvement and I especially like the option of computing Cooks distance values for entire groups/clusters of estimates.

Adding the argument progbar = TRUE to the function works well.

Thanks again,

Roger ☺

-----Message d'origine-----
De : Viechtbauer Wolfgang (SP) [mailto:wolfgang.viechtbauer at maastrichtuniversity.nl] 
Envoyé : 31 juillet 2017 16:56
À : r-sig-meta-analysis at r-project.org
Cc : Martineau, Roger
Objet : RE: cooks.distance.rma.mv is slow on complex models

Just a follow-up on this:

If you install the devel version of metafor (http://www.metafor-project.org/doku.php/installation#development_version), you will find a lot of improvements to cooks.distance.rma.mv(). In particular, it:

1) now can do parallel processing,
2) generally should run (quite a bit?) faster (starting values for the repeated model fits are set to the parameter estimates from the 'full' model using all data -- which are likely to be much better starting values than the default ones),
3) offers the possibility to compute approximate Cook's distance values where the variance/correlation components are not re-estimated for each model fit (reestimate=FALSE); doing so only yields an approximation to the Cook's distances that ignores the influence on the variance/correlation components, but is considerably faster (and often yields similar results), and
4) has a 'cluster' argument that allows computing Cook's distances not just for individual estimates, but for entire groups/clusters of estimates.

In Roger's analysis: Fit model 'tmp.casdiet' with 'random = ~1|laboratory/experiment/study' and then use:

cooks.distance(tmp.casdiet) ### default is Cook's distance for all estimates cooks.distance(tmp.casdiet, cluster=tmp.dat.MTPY.new$laboratory)
cooks.distance(tmp.casdiet, cluster=tmp.dat.MTPY.new$experiment)

Not entirely sure about the last one -- it depends on how you have coded 'experiment'; if it is not unique across the levels of 'laboratory', you would want to use:

cooks.distance(tmp.casdiet, cluster=interaction(tmp.dat.MTPY.new$laboratory, tmp.dat.MTPY.new$experiment))

Best,
Wolfgang