[R-meta] metafor package in R - Risk ratios using rma.mv()

Sat Oct 5 00:27:51 CEST 2019

Dear Wolfgang and All,

Thanks so much for your prior help.
We have calculated incidence and prevalence rates from a mixed-effects
model using the rma.mv command. We are attaching the results below.

*Variable*

*Cohorts (n)*

*Incidence Rate/100k*

*Lower 95% CI*

*Upper 95% CI*

*World Health Organization Region*

    Americas

225

324.5

166.7

482.2

    African

30

1906.4

1134.1

2678.7

    Eastern Mediterranean

24

249.1

32.3

466.0

    European

33

767.6

407.8

1127.3

    South-East Asia

48

1148.7

628.6

1668.9

    Western Pacific

30

560.0

-131.7

1251.7

*Cohorts (n)*

*Prevalence Rate*

*Lower 95% CI*

*Upper 95% CI*

*World Health Organization Region*

    Americas

33

1.7

0.9

2.5

    African

31

2.8

1.4

4.1

    Eastern Mediterranean

7

1.9

0.5

3.4

    European

21

1.9

0.8

3.0

    South-East Asian

10

2.4

-1.1

6.0

    Western Pacific

54

1.2

-0.1

2.4

Unfortunately we have some negative confidence intervals for some of our
incidence and prevalence estimates. We would like to not have any negative
confidence intervals and therefore would like to switch the models that we
are using.

Is there a way to keep our code (which we have put below for both incidence
and prevalence) and run a poisson model for incidence and a binary or beta
model for prevalence so that we no longer have a negative confidence
interval for some of our variables? We noticed that when we run the model
using log transformed incident and prevalence rates, the confidence
intervals are positive. We are also wondering what the difference is
between using PR/IR versus PLN/IRLN for fitting the model, and why the
latter would result in all positive confidence intervals.

Thank you again for all your help!

Best
Olivia and Leo

*Code for incidence rates: *
#data subsetted by WHO region

pd_ec <- escalc(measure = 'IR', xi = data_sub$inc_positive,ti =
data_sub$inc_person_years, append = TRUE,
            data = data_sub)

m0 <- rma.mv(yi, vi, method='REML', mods = formula,
                        random= ~ 1 | study_id/cohort_id,
                        tdist=TRUE,
                        data=pd_ec)

*Code for prevalence rates: *
#data subsetted by WHO region

pd_ec <- escalc(
            measure = 'PR', xi = data_sub$prev_positive,ni =
data_sub$prev_total_n, append = TRUE,
            data = data_sub)

 m0 <- rma.mv(yi, vi, method='REML', mods = formula,
                        random= ~ 1 | study_id/cohort_id,
                        tdist=TRUE,
                        data=pd_ec)

Thank you for any help you can provide!

Best
Leo and Olivia

Leonardo Martinez, PhD, MPH
Stanford University School of Medicine
Division of Infectious Diseases and Geographic Medicine
300 Pasteur Drive, Lane Building, Stanford, CA 94305
Phone: +1.202.769.8090
Email: leomarti using stanford.edu; chopotin using gmail.com
Website <https://profiles.stanford.edu/leonardo-martinez-pantoja>

On Wed, Sep 4, 2019 at 10:57 AM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:

> Forgot to cc the mailing list, so resending this.
>
> -----Original Message-----
> From: Viechtbauer, Wolfgang (SP)
> Sent: Wednesday, 04 September, 2019 19:36
> To: 'Leo Martinez'; 'Olivia Cords'
> Subject: RE: [R-meta] metafor package in R - Risk ratios using rma.mv()
>
> Dear Leo, Dear Olivia,
>
> Late response (to Olivia), but I was out of the office the entire August.
>
> Q2) Yes, one can estimate rate ratios this way.
>
> They are different because the log transformation is non-linear. Also, the
> normal approximation of the sampling distributions doesn't work in the same
> way on the raw and on the log scale. To illustrate:
>
> Let's say we observe x=5 cases in t=100 person years, so IR = 5/100. For
> IR values, the normal approximation is IR ~ N(theta, theta/t), where theta
> is the true rate per person year (this follows from assuming that x is
> Poisson distributed with rate t*theta), so we estimate the sampling
> variance with v = IR/t. Hence, a 95% CI for theta is given by:
>
> x <- 5
> t <- 100
> IR <- x/t
> IR + c(-1,1) * qnorm(.975) * sqrt(IR/t)
>
> which yields
>
> 0.006173873 0.093826127
>
> For log(IR), the normal approximation (after using the delta method) is
> log(IR) ~ N(log(theta), 1/(t*theta)), so we estimate the sampling variance
> with v = 1/x. Hence, a 95% CI for theta is given by:
>
> exp(log(IR) + c(-1,1) * qnorm(.975) * sqrt(1/x))
>
> which yields
>
> 0.02081139 0.12012652
>
> As you can see, these results are not the same. And this doesn't yet get
> into the additional complexities involved when fitting the model you are
> fitting (where we estimate additional variance components, which in turn
> also has implications for how the estimates are weighted and combined).
>
> Q3) Just exponentiate the CIs for the model coefficients for the model
> fitted with measure = "IRLN". So, exp(3.1798) is the first rate ratio with
> (approximate) 95% CI exp(1.9348) and exp(4.4248).
>
> There is also a technical issue here that is relevant whenever we analyze
> outcomes on some transformed scale where the transformation is non-linear.
> exp(3.1798) is actually not the estimated *average* incidence rate for the
> African region. To be precise, the correct interpretation is that
> exp(3.1798) is the estimated *median* incidence rate for the African
> region. The problem is that f(E(X)) != E(f(X)) whenever f() is non-linear
> (Jensen's inequality). However, f(M(X)) = M(f(X)) when M() is the median.
>
> So, if we have the estimated average log incidence rate (which, under the
> normality assumptions of the model, is equal to the estimated median log
> incidence rate), the back-transformation gives us the estimated median
> incidence rate (and not the estimated average incidence rate). So, this is
> another reason why results are different when you analyze raw or log
> transformed incidence rates.
>
> Essentially everybody ignores this issue when analyzing transformed
> outcomes. This also applies to correlations, where there was a lot of
> debate in the literature around the question whether we should analyze raw
> or r-to-z transformed correlations (Adam Hafdahl eventually pointed out
> this issue in this context).
>
> Best,
> Wolfgang
>
> -----Original Message-----
> From: R-sig-meta-analysis [mailto:
> r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Leo Martinez
> Sent: Wednesday, 04 September, 2019 18:10
> To: r-sig-meta-analysis using r-project.org
> Subject: Re: [R-meta] metafor package in R - Risk ratios using rma.mv()
>
> Dear All,
>
> Thanks for your previous help on this thread. I just wanted to follow up on
> this topic with a few additional questions regarding  incident rate ratios
> and confidence intervals using the metafor package and the rma.mv()
> command.
>
> *Incidence Rates*I calculated the incidence rates (measure = "IR", ti =
> data$person_years/1000) for tuberculosis on the data subsetted by World
> Health Organization Region and got the following results from the model.
>
> m0 <- rma.mv(yi, vi, method='REML', mods = formula,
>                         random= ~ 1 | study_id/cohort_id,
>                         tdist=TRUE,
>                         data=pd_ec)
>
> Americas Region 3.244
> African Region 19.06
> Eastern Mediterranean Region 2.491
> European Region 7.675
> South-East Asian Region 11.48
> Western Pacific Region 5.600
>
> Rate Ratios:
>
> Following your suggestion above for calculating rate ratios for each WHO
> Region, I used the measure = "IRLN" and exponentiated the coefficients of
> the model. I got the following model output:
>
> Multivariate Meta-Analysis Model (k = 390; method: REML)
>
> Variance Components:
>
>             estim    sqrt  nlvls  fixed              factor
> sigma^2.1  2.3898  1.5459     76     no            study_id
> sigma^2.2  0.5833  0.7637    390     no  study_id/cohort_id
>
> Test for Residual Heterogeneity:
> QE(df = 384) = 118469.9261, p-val < .0001
>
> Test of Moderators (coefficients 2:6):
> F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001
>
> Model Results:
>
>                                           estimate      se      tval
> pval
> intrcpt                                    -7.1326  0.2449  -29.1259
> <.0001
> who_region.1African Region                  3.1798  0.6332    5.0217
> <.0001
> who_region.1Eastern Mediterranean Region    0.9541  0.9634    0.9904
> 0.3226
> who_region.1European Region                 1.8665  0.5612    3.3261
> 0.0010
> who_region.1South-East Asian Region         2.6693  0.9360    2.8518
> 0.0046
> who_region.1Western Pacific Region          1.4697  0.8381    1.7536
> 0.0803
>                                             ci.lb    ci.ub
> intrcpt                                   -7.6141  -6.6511  ***
> who_region.1African Region                 1.9348   4.4248  ***
> who_region.1Eastern Mediterranean Region  -0.9401   2.8483
> who_region.1European Region                0.7631   2.9698  ***
> who_region.1South-East Asian Region        0.8290   4.5096   **
> who_region.1Western Pacific Region        -0.1781   3.1176    .
>
> And exponentiating the coefficients, I got the following rate ratios:
>
> Intrcpt (Americas) 0.000799
> African Region 24.04225
> Eastern Mediterranean Region 2.596419
> European Region 6.465383
> South-East Asian Region 14.42971
> Western Pacific Region 4.348128
>
> Based on the incidence rates produced by using the entire dataset in the
> model and first subsetting by region, these Rate Ratios don't seem to be
> correct. Simply dividing the incidence rates by a comparator (Region of the
> Americas) to produce rate ratios would give the following:
>
> Region of the Americas
> African Region 5.875294245
> Eastern Mediterranean Region 0.767788539
> European Region 2.365564921
> South-East Asian Region 3.540304225
> Western Pacific Region 1.725952561
>
> *Q2) Is exponentiating the coefficients (measure = IRLN) the way to
> calculate rate ratios? Why are these results so different?*
>
> *Confidence Intervals*To calculate the 95% confidence intervals for the
> rate ratios, I first calculated the standard deviation (SD[ln(IR)] = (1/A1
> + 1/A2)^0.5, where A1 and A2 are the number of tuberculosis cases in each
> region), and then used the following: 95% CI's = exp[ln(IR) ± 1.96(SD)]).
> It seems that this does not take into account the nested structure of the
> data.
>
> *Q3) Is there a way to calculate the confidence intervals from the model
> (either measure = "IRLN" or measure = "IR") output that takes into account
> the nested structure of the data?*
> Thank you again for your advice and the creation of this package.
>
> Best
> Leo
>
> Leonardo Martinez, PhD, MPH
> Stanford University School of Medicine
> Division of Infectious Diseases and Geographic Medicine
> 300 Pasteur Drive, Lane Building, Stanford, CA 94305
> Phone: +1.202.769.8090
> Email: leomarti using stanford.edu; chopotin using gmail.com
> Website <https://profiles.stanford.edu/leonardo-martinez-pantoja>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20191004/1c753ea8/attachment-0001.html>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1003_PR_IR.xlsx
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 14856 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-meta-analysis/attachments/20191004/1c753ea8/attachment-0001.xlsx>