# [R-meta] Benefits to metafor when missing vi estimates?

Viechtbauer Wolfgang (SP) wolfgang.viechtbauer at maastrichtuniversity.nl
Fri Nov 10 10:14:40 CET 2017

```See below for my responses.

Best,
Wolfgang

>-----Original Message-----
>From: bronwenstanford at gmail.com [mailto:bronwenstanford at gmail.com] On
>Behalf Of Bronwen Stanford
>Sent: Friday, 10 November, 2017 1:42
>To: Viechtbauer Wolfgang (SP)
>Cc: r-sig-meta-analysis at r-project.org
>Subject: Re: [R-meta] Benefits to metafor when missing vi estimates?
>
>Thank you very much for this.
>
>I'd like to make sure I'm interpreting the code you provided for
>estimating variance within metafor correctly. My understanding is that
>the tau2 term is setting the variance associated with each individual row
>("number") equal to zero, and applying one level of variance to those
>studies with variance known, and a different level to studies with
>variance unknown (so viknown =1 and viknown=0 get different estimates).
>This allows the model to apply a larger tau2 value to the studies without
>variance if needed, to compensate for the fact that their vi values are
>set at 0. Is that right?

Correct, the value of tau^2 is constrained to 0 for studies where we do know 'vi' and it is estimated for studies where we do not know 'vi' (where instead we set 'vi' to 0). Marginally, we therefore use 'vi + 0' as the sampling variance for studies where we know 'vi' and '0 + tau^2' for studis where we do not know 'vi'.

>Would I be correct in thinking that for those points without variance
>this model behaves very similarly to the nlme model, and the main benefit
>is that I can use the provided variance for those 15% of points with
>variance included?

Correct. And since you only know 'vi' for 15% of studies, the difference should be rather small.

>Another possibility that has been suggested to me is using the data
>points with known variance to estimate one I2 for the entire dataset, and
>then using this to calculate vi. This would result in one vi value for
>the entire dataset, which seems like it has similar problems as setting
>all vi=1. Do you see benefits to an approach like this?

I don't see how I^2 has any relevance for estimating the sampling variances. But this aside, yes, one could use the known 'vi' values and compute some kind of average to plug in for the unknown 'vi' values, but this makes strong assumptions. I see no benefits to doing that.

>Thank you so much
>Bronwen
>
>Bronwen Stanford
>Ph.D. Candidate
>Environmental Studies Department
>University of California, Santa Cruz
>
>On Tue, Oct 31, 2017 at 3:54 AM, Viechtbauer Wolfgang (SP)
><wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
>Dear Bronwen,
>
>Simply setting vi=1 for studies where the sampling variance is unknown is
>not appropriate.
>
>Instead, you might want to use a model as suggested by James (in the post
>you linked to). In your case, you would have to assume homoscedasticity
>of the error/sampling variances (instead of assuming that they are
>inversely proportional to the sample sizes or number of replicates). This
>can then be followed up by using cluster-robust inference methods, which
>should also account (at least asymptotically) for the fact that the
>sampling variances are actually heteroscedastic.
>
>One could also use a model that sets the sampling variances to the known
>values for those studies where the information required to compute 'vi'
>is available and estimates 'vi' (under the homoescedasticity assumption)
>for the remaining studies. With a bit of trickery, this can actually be
>done with metafor. Here is an example:
>
>library(metafor)
>
>dat <- get(data(dat.konstantopoulos2011))
>
>### fit multilevel model
>res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
>res
>
>### pretend that 'vi' is only known for a subset of the studies
>### and set 'vi' to 0 for studies where 'vi' is unknown
>set.seed(1235)
>dat\$viknown <- 0
>dat\$viknown[sample(1:nrow(dat), 10)] <- 1
>dat\$vi[dat\$viknown == 0] <- 0
>
>### fit model that estimates the sampling variance for studies where 'vi'
>is unknown
>### (assuming that the sampling variance is homoscedastic for those
>studies)
>res <- rma.mv(yi, vi, random = list(~ 1 | district/school, ~
>factor(viknown) | study), struct="DIAG", tau2=c(NA,0), data=dat)
>res
>
>You would want to follow this up with cluster-robust inference methods
>again, since we know that 'vi' is not homoescedastic in studies where it
>was unknown. So:
>
>robust(res, cluster=dat\$district)
>
>Or more refined:
>
>library(clubSandwich)
>coef_test(res, vcov="CR2")
>
>That seems like quite a bit of work though instead of just:
>
>library(nlme)
>res <- lme(yi ~ 1, random = ~ 1 | district, data=dat)
>coef_test(res, vcov="CR2")
>
>Best,
>Wolfgang
>
>--
>Wolfgang Viechtbauer, Ph.D., Statistician | Department of Psychiatry and
>Neuropsychology | Maastricht University | P.O. Box 616 (VIJV1) | 6200 MD
>Maastricht, The Netherlands | +31 (43) 388-4170 | http://www.wvbauer.com
>
>-----Original Message-----
>From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces at r-
>project.org] On Behalf Of Bronwen Stanford
>Sent: Monday, 30 October, 2017 19:17
>To: r-sig-meta-analysis at r-project.org
>Subject: [R-meta] Benefits to metafor when missing vi estimates?
>
>I am conducting a meta-analysis on a dataset that contains sample size
>and
>error estimates for only 15% of the data points. I'm constructing a
>mixed-effects (multi-level) model using rma.mv, and the model includes
>one
>random effect (representing study) and multiple fixed effects, both
>continuous and categorical. I have been advised to use metafor and assign
>a
>constant value to vi (e.g. vi=1) for all data points without error
>estimates to improve the model estimates of standard errors.  However,
>
>https://stat.ethz.ch/pipermail/r-sig-meta-analysis/2017-
>October/000252.html
>
>this seems like potentially an inappropriate use of metafor - I'm telling
>the model I have information about variance when variance is in fact
>unknown (and my dataset does not qualify for a "true" meta-analysis).
>
>My coefficient estimates using metafor (with vi=1) and lmer (or lme) are
>also different (in both magnitude and significance), which concerns me.
>Any
>thoughts on the most appropriate way to approach this less-than-ideal
>dataset? Does using metafor in this case (with a constant vi value)
>improve
>model accuracy, or is it reasonable to stick with standard mixed-effects
>modeling packages?
>
>Thanks!
```