[R-meta] Performance of metafor::vcalc() vs clubSandwich::impute_covariance_matrix()
Tamar Novetsky
t@m@r @end|ng |rom growprogre@@@@|
Tue Aug 6 17:18:58 CEST 2024
Thanks so much, James! Unfortunately, I didn't find a big enough
improvement in performance using vcalc(sparse = TRUE) - in the example
below, the default vcalc arguments take ~100x longer than
impute_covariance_matrix, while vcalc(sparse = TRUE) takes ~60x longer.
I couldn't reproduce the 2x values using non-proprietary data, so there
might just be something weird going on with my dataset!
Reproducible example (adapted from metafor's examples in the vcalc function
documentation):
```
library(tidyverse)
library(metafor)
library(clubSandwich)
library(microbenchmark)
set.seed(42)
# example data from metafor
dat <- dat.assink2016
# augment data so it has >1500 rows
new_rows <-
tibble(
study = 18:167,
n_esid = sample(x = 1:max(dat$esid), size = 150, replace = TRUE)
) %>%
uncount(n_esid) %>%
group_by(study) %>%
mutate(esid = row_number()) %>%
ungroup() %>%
mutate(
id = row_number() + 100,
yi = rnorm(nrow(.), mean(dat$yi), sd(dat$yi)),
vi = rnorm(nrow(.), mean(dat$vi), sd(dat$vi)),
vi = if_else(vi < 0, -1*vi, vi), # make sure vi is always positive
pubstatus = sample(x = dat$pubstatus, size = nrow(.), replace = TRUE),
year = sample(x = dat$year, size = nrow(.), replace = TRUE),
deltype = sample(x = dat$deltype, size = nrow(.), replace = TRUE)
)
dat_big <- bind_rows(dat, new_rows)
# benchmark performance with full matrix (this takes a minute to run)
res <- microbenchmark(
"metafor" = vcalc(vi, cluster = study, obs = esid, data = dat_big, rho =
0.6),
"clubSandwich" = impute_covariance_matrix(vi = dat_big$vi, cluster =
dat_big$study, r = 0.6, return_list = FALSE),
times = 10
)
summary(res)
# benchmark performance with sparse matrix (also takes a minute to run)
res_sparse <- microbenchmark(
"metafor" = vcalc(vi, cluster = study, obs = esid, data = dat_big, rho =
0.6, sparse = TRUE),
"clubSandwich" = impute_covariance_matrix(vi = dat_big$vi, cluster =
dat_big$study, r = 0.6, return_list = FALSE),
times = 10
)
summary(res_sparse)
```
Thanks again,
*Tamar Novetsky* *(she/her)*
Data Scientist I
Eastern Time Zone
On Tue, Aug 6, 2024 at 10:20 AM James Pustejovsky <jepusto using gmail.com> wrote:
> Hi Tamar,
>
> The difference in compute time is because of a difference in how the
> default output of these functions is structured.
> clubSandwich::impute_covariance_matrix() returns a block-diagonal by
> default. metafor::vcalc() returns a full (dense) matrix by default. Say
> that you have J studies and study j has kj effect sizes. The block-diagonal
> matrix has sum(kj^2) entries, whereas the full matrix has sum(kj)^2
> entries. If J is large and the kjs are mostly small, this can make for a
> really big difference in object size. However, setting the option
> vcalc(sparse = TRUE) will return a block-diagonal matrix and should lead to
> performance comparable to impute_covariance_matrix().
>
> Regarding your second question, I'm not sure what might be going on. Could
> you provide a reproducible example?
>
> James
>
> On Tue, Aug 6, 2024 at 8:20 AM Tamar Novetsky via R-sig-meta-analysis <
> r-sig-meta-analysis using r-project.org> wrote:
>
>> Hello,
>>
>> I am working on a script to run multiple meta-regressions on different
>> subsets of the same dataset, and have been
>> using clubSandwich::impute_covariance_matrix() to generate the
>> variance-covariance matrix necessary as an input to metafor::rma.mv().
>> However, I recently learned that impute_covariance_matrix() has been
>> superseded by metafor::vcalc(), so I have been working to replace my usage
>> of the former function with the latter. In that process, I discovered that
>> vcalc() seems to be much slower than impute_covariance_matrix() - about
>> 150x slower in one use case that I benchmarked using the microbenchmark
>> package. Since I will be running this many times in a loop, performance
>> matters quite a lot to me in this context.
>>
>> Can anyone help me understand why vcalc() would be so much slower? Is it
>> possible that I'm using it incorrectly?
>>
>> Secondly/possibly relatedly, I found that the results from vcalc() are
>> always either exactly the same or exactly double the results from
>> impute_covariance_matrix(). Does anyone have a sense of why that would be?
>> Could that be related to the performance differences?
>>
>> Thanks so much for your help,
>>
>>
>> *Tamar Novetsky* *(she/her)*
>> Data Scientist I
>> Eastern Time Zone
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-meta-analysis mailing list @ R-sig-meta-analysis using r-project.org
>> To manage your subscription to this mailing list, go to:
>> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>>
>
[[alternative HTML version deleted]]
More information about the R-sig-meta-analysis
mailing list