[R-meta] metafor package in R - Risk ratios using rma.mv()
Viechtbauer, Wolfgang (SP)
wo||g@ng@v|echtb@uer @end|ng |rom m@@@tr|chtun|ver@|ty@n|
Wed Sep 4 19:52:20 CEST 2019
Forgot to cc the mailing list, so resending this.
From: Viechtbauer, Wolfgang (SP)
Sent: Wednesday, 04 September, 2019 19:36
To: 'Leo Martinez'; 'Olivia Cords'
Subject: RE: [R-meta] metafor package in R - Risk ratios using rma.mv()
Dear Leo, Dear Olivia,
Late response (to Olivia), but I was out of the office the entire August.
Q2) Yes, one can estimate rate ratios this way.
They are different because the log transformation is non-linear. Also, the normal approximation of the sampling distributions doesn't work in the same way on the raw and on the log scale. To illustrate:
Let's say we observe x=5 cases in t=100 person years, so IR = 5/100. For IR values, the normal approximation is IR ~ N(theta, theta/t), where theta is the true rate per person year (this follows from assuming that x is Poisson distributed with rate t*theta), so we estimate the sampling variance with v = IR/t. Hence, a 95% CI for theta is given by:
x <- 5
t <- 100
IR <- x/t
IR + c(-1,1) * qnorm(.975) * sqrt(IR/t)
For log(IR), the normal approximation (after using the delta method) is log(IR) ~ N(log(theta), 1/(t*theta)), so we estimate the sampling variance with v = 1/x. Hence, a 95% CI for theta is given by:
exp(log(IR) + c(-1,1) * qnorm(.975) * sqrt(1/x))
As you can see, these results are not the same. And this doesn't yet get into the additional complexities involved when fitting the model you are fitting (where we estimate additional variance components, which in turn also has implications for how the estimates are weighted and combined).
Q3) Just exponentiate the CIs for the model coefficients for the model fitted with measure = "IRLN". So, exp(3.1798) is the first rate ratio with (approximate) 95% CI exp(1.9348) and exp(4.4248).
There is also a technical issue here that is relevant whenever we analyze outcomes on some transformed scale where the transformation is non-linear. exp(3.1798) is actually not the estimated *average* incidence rate for the African region. To be precise, the correct interpretation is that exp(3.1798) is the estimated *median* incidence rate for the African region. The problem is that f(E(X)) != E(f(X)) whenever f() is non-linear (Jensen's inequality). However, f(M(X)) = M(f(X)) when M() is the median.
So, if we have the estimated average log incidence rate (which, under the normality assumptions of the model, is equal to the estimated median log incidence rate), the back-transformation gives us the estimated median incidence rate (and not the estimated average incidence rate). So, this is another reason why results are different when you analyze raw or log transformed incidence rates.
Essentially everybody ignores this issue when analyzing transformed outcomes. This also applies to correlations, where there was a lot of debate in the literature around the question whether we should analyze raw or r-to-z transformed correlations (Adam Hafdahl eventually pointed out this issue in this context).
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Leo Martinez
Sent: Wednesday, 04 September, 2019 18:10
To: r-sig-meta-analysis using r-project.org
Subject: Re: [R-meta] metafor package in R - Risk ratios using rma.mv()
Thanks for your previous help on this thread. I just wanted to follow up on
this topic with a few additional questions regarding incident rate ratios
and confidence intervals using the metafor package and the rma.mv()
*Incidence Rates*I calculated the incidence rates (measure = "IR", ti =
data$person_years/1000) for tuberculosis on the data subsetted by World
Health Organization Region and got the following results from the model.
m0 <- rma.mv(yi, vi, method='REML', mods = formula,
random= ~ 1 | study_id/cohort_id,
Americas Region 3.244
African Region 19.06
Eastern Mediterranean Region 2.491
European Region 7.675
South-East Asian Region 11.48
Western Pacific Region 5.600
Following your suggestion above for calculating rate ratios for each WHO
Region, I used the measure = "IRLN" and exponentiated the coefficients of
the model. I got the following model output:
Multivariate Meta-Analysis Model (k = 390; method: REML)
estim sqrt nlvls fixed factor
sigma^2.1 2.3898 1.5459 76 no study_id
sigma^2.2 0.5833 0.7637 390 no study_id/cohort_id
Test for Residual Heterogeneity:
QE(df = 384) = 118469.9261, p-val < .0001
Test of Moderators (coefficients 2:6):
F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001
estimate se tval pval
intrcpt -7.1326 0.2449 -29.1259 <.0001
who_region.1African Region 3.1798 0.6332 5.0217 <.0001
who_region.1Eastern Mediterranean Region 0.9541 0.9634 0.9904 0.3226
who_region.1European Region 1.8665 0.5612 3.3261 0.0010
who_region.1South-East Asian Region 2.6693 0.9360 2.8518 0.0046
who_region.1Western Pacific Region 1.4697 0.8381 1.7536 0.0803
intrcpt -7.6141 -6.6511 ***
who_region.1African Region 1.9348 4.4248 ***
who_region.1Eastern Mediterranean Region -0.9401 2.8483
who_region.1European Region 0.7631 2.9698 ***
who_region.1South-East Asian Region 0.8290 4.5096 **
who_region.1Western Pacific Region -0.1781 3.1176 .
And exponentiating the coefficients, I got the following rate ratios:
Intrcpt (Americas) 0.000799
African Region 24.04225
Eastern Mediterranean Region 2.596419
European Region 6.465383
South-East Asian Region 14.42971
Western Pacific Region 4.348128
Based on the incidence rates produced by using the entire dataset in the
model and first subsetting by region, these Rate Ratios don't seem to be
correct. Simply dividing the incidence rates by a comparator (Region of the
Americas) to produce rate ratios would give the following:
Region of the Americas
African Region 5.875294245
Eastern Mediterranean Region 0.767788539
European Region 2.365564921
South-East Asian Region 3.540304225
Western Pacific Region 1.725952561
*Q2) Is exponentiating the coefficients (measure = IRLN) the way to
calculate rate ratios? Why are these results so different?*
*Confidence Intervals*To calculate the 95% confidence intervals for the
rate ratios, I first calculated the standard deviation (SD[ln(IR)] = (1/A1
+ 1/A2)^0.5, where A1 and A2 are the number of tuberculosis cases in each
region), and then used the following: 95% CI's = exp[ln(IR) ± 1.96(SD)]).
It seems that this does not take into account the nested structure of the
*Q3) Is there a way to calculate the confidence intervals from the model
(either measure = "IRLN" or measure = "IR") output that takes into account
the nested structure of the data?*
Thank you again for your advice and the creation of this package.
Leonardo Martinez, PhD, MPH
Stanford University School of Medicine
Division of Infectious Diseases and Geographic Medicine
300 Pasteur Drive, Lane Building, Stanford, CA 94305
Email: leomarti using stanford.edu; chopotin using gmail.com
More information about the R-sig-meta-analysis