[R-meta] test different levels of a random effect

Fri Nov 15 05:15:47 CET 2024

Dear community,

My question might look weird. Let me explain my question using a concert example.

Dat.moura2021 in metadata package contains 457 studies on assortative mating in 345 species.
library(metafor)
dat <- dat.moura2021$dat

### calculate r-to-z transformed correlations and corresponding sampling variances
dat <- escalc(measure="ZCOR", ri=ri, ni=ni, data=dat)
### assume that the effect sizes within studies are correlated with rho=0.6
V <- vcalc(vi, cluster=study.id, obs=effect.size.id, data=dat, rho=0.6)
### fit a model with study identity, observation, and species as random effects
res <- rma.mv(yi, V, random = list(~ 1 | study.id, ~ 1 | effect.size.id, ~ 1 | species.id), data=dat, sparse = T)

In this specific dataset, I want to know whether (1) an estimate associated with a certain species is significantly different from 0 (in the sense of P value), and (2) whether any two of such species-specific estimates of interest are different.

1. A natural solution that occurs to me is to add the species variable (in our case, species.id) as a predictor rather than a random effect, and we can do subsequent test of factors and linear combinations of parameters.
### add species.id as a predictor
res2 <- rma.mv(yi, V, mods = ~ species.id - 1, random = list(~ 1 | study.id, ~ 1 | effect.size.id), data=dat, sparse = T)

This has a potential issue that model res2 is likely to be overparameterized. Of course, we can check the likelihood profile of each parameter estimate to confirm whether each estimate is identifiable. But this takes a long time to do so. Importantly, species.id should be a random effect rather than a fixed effect.

2. A second approach is similar to the first one but has a slightly different assumption. A normal regression model assumes homogeneous variance (tau^2) across different levels of a predictor. But species.id has 345 levels. It is not very reasonable to assume all 345 species share a common heterogeneity. To relax this assumption, we can use a subgroup random effects model, which was proposed in Pustejovsky, J. E., & Tipton, E. (2022)

### fit a subgroup random effects model
V_subgroup <- vcalc(vi, cluster=study.id, subgroup=species.id, data=dat, rho=0.6)
res3 <- rma.mv(yi, V_subgroup, mods = ~ species.id - 1, random = list(~ species.id | study.id), struct = "DIAG", data=dat, sparse = T)
### add effect size id as an additional random effect
res4 <- rma.mv(yi, V_subgroup, mods = ~ species.id - 1, random = list(~ species.id | study.id, ~ 1 | effect.size.id), struct = "DIAG", data=dat, sparse = T)

I do not know which one (res3 vs. res4) is better

3. We can sum up the intercept (or fixed effect) and random effect estimates of species.id of the model object res to get the species-specific estimates. Then, we use the Wald-type method to test the species-specific estimates and difference between them.

### get species's random effect estimate
res_ranef <- ranef(res)
### combine with the fixed effect estimate
est <- data.frame(species = rownames(res_ranef$species.id),
                  beta = res$beta[1] + res_ranef$species.id$intrcpt,
                  se = sqrt(res$se[1]^2 + res_ranef$species.id$se^2))

But this approach will ignore the covariance between different species.

I would be grateful if anyone can provide insights into (1) which approaches above is preferrable, and (2) if none of them is valid,  can you help find a solution?

Best,
Yefeng

	[[alternative HTML version deleted]]