[R-meta] variance explained by fixed & random effects

Wed Apr 3 11:53:26 CEST 2019

In principle, yes.

But, I highly question the sensibility of any model that does not include estimate-level random effects. In the model you posted, random effects are added for each level of ablat, but this assumes that true effects are homogeneous for the same level of ablat. That is a big assumption and one that I don't think is (a priori) justified. 

I frequently see the same issue when people are fitting multilevel models. Instead of *adding* random effects for some higher level clustering variable to a model that already includes estimate level random effects, only cluster level random effects are included in the model. For a discussion of this, see:

http://www.metafor-project.org/doku.php/analyses:konstantopoulos2011#a_common_mistake_in_the_three-level_model

In your example, I therefore would say that one should fit:

res2 <- rma.mv(yi, vi, mods = ~ alloc, random = ~1|ablat/trial, data=dat)

In this case, the random effects for ablat (which has 9 levels) are difficult to distinguish from the trial (i.e., estimate) level random effects (13 different levels). In fact, it turns out in this case that the trial level variance component is estimated to be 0, so in the end, the fit of the model is the same. But in general, I think one should at least start with a model that includes estimate level random effects.

Best,
Wolfgang

-----Original Message-----
From: Theresa Stratmann [mailto:theresa.stratmann using senckenberg.de] 
Sent: Tuesday, 02 April, 2019 9:38
To: Viechtbauer, Wolfgang (SP); r-sig-meta-analysis using r-project.org
Subject: RE: [R-meta] variance explained by fixed & random effects

Actually, sorry, one more question ... 

Let's say we were to use: 

library(metafor)

dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)

res2 <- rma.mv(yi, vi, mods = ~ alloc, random = ~1|ablat, data=dat)

Do we then use sigma2 instead of tau2? 

Theresa

> On March 29, 2019 at 12:10 PM "Viechtbauer, Wolfgang (SP)" <wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> 
> 
> (I get so excited when I talk about this stuff that I (again) hit 'Send' too soon! Sorry -- continued properly below.)
> 
> Yes, this is a good reference.
> 
> I will try to address a few of your questions.
> 
> 1) What does it means if the pseudo R^2 statistic is negative?
> 
> R^2 type measures in meta-analysis are typically computed as proportional reductions in variance components (that reflect heterogeneity) when predictors/moderators are added to the model. However, for the models we typically use for a meta-analysis, there is no guarantee that the variance components always decrease (or remain unchanged) when we add predictors. So, it can happen that a variance component increases, in which case the R^2 value is technically negative. An example:
> 
> library(metafor)
> 
> dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
> 
> res0 <- rma(yi, vi, data=dat)
> res1 <- rma(yi, vi, mods = ~ alloc, data=dat)
> res1
> 
> # proportional 'reduction' in tau^2 (how R^2 is computed here)
> (res0$tau2 - res1$tau2) / res0$tau2
> 
> If you look at 'res1', R^2 is given as 0%, since negative R^2 values are simply set to 0.
> 
> So what does this mean? I would say it simply means that there is no reduction in the estimate of tau^2, so 'alloc' does not account for any heterogeneity.
> 
> But also keep in mind that the value of R^2 is an estimate that can be quite inaccurate esp. if the number of studies is small (https://onlinelibrary.wiley.com/doi/abs/10.1111/bmsp.12002). One may need maybe 30 or even more studies to get a fairly decent estimate (but don't quote me on that number).
> 
> Another important point here is that we compute R^2 based on how much *heterogeneity* is accounted for, not how much of the *total variance* is accounted for. So, for example:
> 
> res1 <- rma(yi, vi, mods = ~ ablat, data=dat)
> res1
> 
> This suggests that ~76% of the heterogeneity is accounted for by 'ablat' (but again, this estimate could be way off with k=13). But the total variance in this example is tau^2 plus the amount of sampling variability in the estimates (the 'vi' values). Since each study has a different amount of sampling variability, it isn't super clear how we then should think of the total variance, but one commonly used approach is to compute a 'typical' sampling variance and use that to define the total variance. For example, Higgins and Thompson (2002) suggest to compute the 'typical' sampling variance with:
> 
> k <- res1$k
> wi <- 1/dat$vi
> vt <- (k-1) / (sum(wi) - sum(wi^2)/sum(wi)) 
> vt
> 
> Others (https://academic.oup.com/aje/article/150/2/206/55325) have suggested using the harmonic mean, which would be:
> 
> 1/mean(wi)
> 
> Regardless, we can then use tau^2 + vt as the total variance. Then how much of the *total variance* does 'ablat' account?
> 
> ((res0$tau2 + vt) - (res1$tau2 + vt)) / (res0$tau2 + vt)
> 
> That yields about 70%. Why don't we do that? For one, we like larger R^2 values :) But study-level moderators cannot account for sampling variability, so one could also argue it is a bit unfair to see how much of the total variability is accounted for.
> 
> 2) How can we think of variance accounted for by random effects?
> 
> Another issue in this discussion is the question of how to think of variance accounted for by random effects. It makes no sense to ask how much heterogeneity is accounted for by tau^2 here, since that is always 100%. It makes sense to ask how much of the total variability is due to the random effects. In the context of model 'res0', that would be:
> 
> res0$tau2 / (res0$tau2 + vt)
> 
> If you look at res0, you will see that this is I^2.
> 
> For model 'res1', things are a bit more tricky. One approach would be as follow. We again define res0$tau2 + vt as the total variability (note I am using res0$tau2 here, so the amount of heterogeneity before we include moderators, plus sampling variability). Then:
> 
> res1$tau2 / (res0$tau2 + vt)
> 
> indicates how much of this total variability is due to the random effects, that is, residual heterogeneity (about 22%). And:
> 
> vt / (res0$tau2 + vt)
> 
> indicates how much of this total variability is due to sampling variability (about 8%). And by subtraction, we then have:
> 
> 1 - res1$tau2 / (res0$tau2 + vt) - vt / (res0$tau2 + vt)
> 
> which indicates how much of the total variability is due to 'ablat' (about 70%).
> 
> This is *not* something that has been done or suggested before, but makes sense and is completely defensible.
> 
> I hope this helps.
> 
> Best,
> Wolfgang