[R-meta] metafor package in R - Risk ratios using rma.mv()

Wed Sep 4 19:52:20 CEST 2019

Forgot to cc the mailing list, so resending this.

-----Original Message-----
From: Viechtbauer, Wolfgang (SP) 
Sent: Wednesday, 04 September, 2019 19:36
To: 'Leo Martinez'; 'Olivia Cords'
Subject: RE: [R-meta] metafor package in R - Risk ratios using rma.mv()

Dear Leo, Dear Olivia,

Late response (to Olivia), but I was out of the office the entire August.

Q2) Yes, one can estimate rate ratios this way.

They are different because the log transformation is non-linear. Also, the normal approximation of the sampling distributions doesn't work in the same way on the raw and on the log scale. To illustrate:

Let's say we observe x=5 cases in t=100 person years, so IR = 5/100. For IR values, the normal approximation is IR ~ N(theta, theta/t), where theta is the true rate per person year (this follows from assuming that x is Poisson distributed with rate t*theta), so we estimate the sampling variance with v = IR/t. Hence, a 95% CI for theta is given by:

x <- 5
t <- 100
IR <- x/t
IR + c(-1,1) * qnorm(.975) * sqrt(IR/t)

which yields

0.006173873 0.093826127

For log(IR), the normal approximation (after using the delta method) is log(IR) ~ N(log(theta), 1/(t*theta)), so we estimate the sampling variance with v = 1/x. Hence, a 95% CI for theta is given by:

exp(log(IR) + c(-1,1) * qnorm(.975) * sqrt(1/x))

which yields

0.02081139 0.12012652

As you can see, these results are not the same. And this doesn't yet get into the additional complexities involved when fitting the model you are fitting (where we estimate additional variance components, which in turn also has implications for how the estimates are weighted and combined).

Q3) Just exponentiate the CIs for the model coefficients for the model fitted with measure = "IRLN". So, exp(3.1798) is the first rate ratio with (approximate) 95% CI exp(1.9348) and exp(4.4248).

There is also a technical issue here that is relevant whenever we analyze outcomes on some transformed scale where the transformation is non-linear. exp(3.1798) is actually not the estimated *average* incidence rate for the African region. To be precise, the correct interpretation is that exp(3.1798) is the estimated *median* incidence rate for the African region. The problem is that f(E(X)) != E(f(X)) whenever f() is non-linear (Jensen's inequality). However, f(M(X)) = M(f(X)) when M() is the median.

So, if we have the estimated average log incidence rate (which, under the normality assumptions of the model, is equal to the estimated median log incidence rate), the back-transformation gives us the estimated median incidence rate (and not the estimated average incidence rate). So, this is another reason why results are different when you analyze raw or log transformed incidence rates.

Essentially everybody ignores this issue when analyzing transformed outcomes. This also applies to correlations, where there was a lot of debate in the literature around the question whether we should analyze raw or r-to-z transformed correlations (Adam Hafdahl eventually pointed out this issue in this context).

Best,
Wolfgang

-----Original Message-----
From: R-sig-meta-analysis [mailto:r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Leo Martinez
Sent: Wednesday, 04 September, 2019 18:10
To: r-sig-meta-analysis using r-project.org
Subject: Re: [R-meta] metafor package in R - Risk ratios using rma.mv()

Dear All,

Thanks for your previous help on this thread. I just wanted to follow up on
this topic with a few additional questions regarding  incident rate ratios
and confidence intervals using the metafor package and the rma.mv()
command.

*Incidence Rates*I calculated the incidence rates (measure = "IR", ti =
data$person_years/1000) for tuberculosis on the data subsetted by World
Health Organization Region and got the following results from the model.

m0 <- rma.mv(yi, vi, method='REML', mods = formula,
                        random= ~ 1 | study_id/cohort_id,
                        tdist=TRUE,
                        data=pd_ec)

Americas Region 3.244
African Region 19.06
Eastern Mediterranean Region 2.491
European Region 7.675
South-East Asian Region 11.48
Western Pacific Region 5.600

Rate Ratios:

Following your suggestion above for calculating rate ratios for each WHO
Region, I used the measure = "IRLN" and exponentiated the coefficients of
the model. I got the following model output:

Multivariate Meta-Analysis Model (k = 390; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed              factor
sigma^2.1  2.3898  1.5459     76     no            study_id
sigma^2.2  0.5833  0.7637    390     no  study_id/cohort_id

Test for Residual Heterogeneity:
QE(df = 384) = 118469.9261, p-val < .0001

Test of Moderators (coefficients 2:6):
F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001

Model Results:

                                          estimate      se      tval    pval
intrcpt                                    -7.1326  0.2449  -29.1259  <.0001
who_region.1African Region                  3.1798  0.6332    5.0217  <.0001
who_region.1Eastern Mediterranean Region    0.9541  0.9634    0.9904  0.3226
who_region.1European Region                 1.8665  0.5612    3.3261  0.0010
who_region.1South-East Asian Region         2.6693  0.9360    2.8518  0.0046
who_region.1Western Pacific Region          1.4697  0.8381    1.7536  0.0803
                                            ci.lb    ci.ub
intrcpt                                   -7.6141  -6.6511  ***
who_region.1African Region                 1.9348   4.4248  ***
who_region.1Eastern Mediterranean Region  -0.9401   2.8483
who_region.1European Region                0.7631   2.9698  ***
who_region.1South-East Asian Region        0.8290   4.5096   **
who_region.1Western Pacific Region        -0.1781   3.1176    .

And exponentiating the coefficients, I got the following rate ratios:

Intrcpt (Americas) 0.000799
African Region 24.04225
Eastern Mediterranean Region 2.596419
European Region 6.465383
South-East Asian Region 14.42971
Western Pacific Region 4.348128

Based on the incidence rates produced by using the entire dataset in the
model and first subsetting by region, these Rate Ratios don't seem to be
correct. Simply dividing the incidence rates by a comparator (Region of the
Americas) to produce rate ratios would give the following:

Region of the Americas
African Region 5.875294245
Eastern Mediterranean Region 0.767788539
European Region 2.365564921
South-East Asian Region 3.540304225
Western Pacific Region 1.725952561

*Q2) Is exponentiating the coefficients (measure = IRLN) the way to
calculate rate ratios? Why are these results so different?*

*Confidence Intervals*To calculate the 95% confidence intervals for the
rate ratios, I first calculated the standard deviation (SD[ln(IR)] = (1/A1
+ 1/A2)^0.5, where A1 and A2 are the number of tuberculosis cases in each
region), and then used the following: 95% CI's = exp[ln(IR) ± 1.96(SD)]).
It seems that this does not take into account the nested structure of the
data.

*Q3) Is there a way to calculate the confidence intervals from the model
(either measure = "IRLN" or measure = "IR") output that takes into account
the nested structure of the data?*
Thank you again for your advice and the creation of this package.

Best
Leo

Leonardo Martinez, PhD, MPH
Stanford University School of Medicine
Division of Infectious Diseases and Geographic Medicine
300 Pasteur Drive, Lane Building, Stanford, CA 94305
Phone: +1.202.769.8090
Email: leomarti using stanford.edu; chopotin using gmail.com
Website <https://profiles.stanford.edu/leonardo-martinez-pantoja>