[R-meta] metafor package in R - Risk ratios using rma.mv()

Wed Sep 4 18:10:29 CEST 2019

Dear All,

Thanks for your previous help on this thread. I just wanted to follow up on
this topic with a few additional questions regarding  incident rate ratios
and confidence intervals using the metafor package and the rma.mv()
command.

*Incidence Rates*I calculated the incidence rates (measure = "IR", ti =
data$person_years/1000) for tuberculosis on the data subsetted by World
Health Organization Region and got the following results from the model.

m0 <- rma.mv(yi, vi, method='REML', mods = formula,
                        random= ~ 1 | study_id/cohort_id,
                        tdist=TRUE,
                        data=pd_ec)

                                      Americas Region 3.244
African Region 19.06
Eastern Mediterranean Region 2.491
European Region 7.675
South-East Asian Region 11.48
Western Pacific Region 5.600

Rate Ratios:

Following your suggestion above for calculating rate ratios for each WHO
Region, I used the measure = "IRLN" and exponentiated the coefficients of
the model. I got the following model output:

Multivariate Meta-Analysis Model (k = 390; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed              factor
sigma^2.1  2.3898  1.5459     76     no            study_id
sigma^2.2  0.5833  0.7637    390     no  study_id/cohort_id

Test for Residual Heterogeneity:
QE(df = 384) = 118469.9261, p-val < .0001

Test of Moderators (coefficients 2:6):
F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001

Model Results:

                                          estimate      se      tval    pval
intrcpt                                    -7.1326  0.2449  -29.1259  <.0001
who_region.1African Region                  3.1798  0.6332    5.0217  <.0001
who_region.1Eastern Mediterranean Region    0.9541  0.9634    0.9904  0.3226
who_region.1European Region                 1.8665  0.5612    3.3261  0.0010
who_region.1South-East Asian Region         2.6693  0.9360    2.8518  0.0046
who_region.1Western Pacific Region          1.4697  0.8381    1.7536  0.0803
                                            ci.lb    ci.ub
intrcpt                                   -7.6141  -6.6511  ***
who_region.1African Region                 1.9348   4.4248  ***
who_region.1Eastern Mediterranean Region  -0.9401   2.8483
who_region.1European Region                0.7631   2.9698  ***
who_region.1South-East Asian Region        0.8290   4.5096   **
who_region.1Western Pacific Region        -0.1781   3.1176    .

And exponentiating the coefficients, I got the following rate ratios:

Intrcpt (Americas) 0.000799
African Region 24.04225
Eastern Mediterranean Region 2.596419
European Region 6.465383
South-East Asian Region 14.42971
Western Pacific Region 4.348128

Based on the incidence rates produced by using the entire dataset in the
model and first subsetting by region, these Rate Ratios don't seem to be
correct. Simply dividing the incidence rates by a comparator (Region of the
Americas) to produce rate ratios would give the following:

Region of the Americas
African Region
5.875294245
Eastern Mediterranean Region
0.767788539
European Region 2.365564921
South-East Asian Region 3.540304225
Western Pacific Region 1.725952561

*Q2) Is exponentiating the coefficients (measure = IRLN) the way to
calculate rate ratios? Why are these results so different?*

*Confidence Intervals*To calculate the 95% confidence intervals for the
rate ratios, I first calculated the standard deviation (SD[ln(IR)] = (1/A1
+ 1/A2)^0.5, where A1 and A2 are the number of tuberculosis cases in each
region), and then used the following: 95% CI's = exp[ln(IR) ± 1.96(SD)]).
It seems that this does not take into account the nested structure of the
data.

*Q3) Is there a way to calculate the confidence intervals from the model
(either measure = "IRLN" or measure = "IR") output that takes into account
the nested structure of the data?*
Thank you again for your advice and the creation of this package.

Best
Leo

Leonardo Martinez, PhD, MPH
Stanford University School of Medicine
Division of Infectious Diseases and Geographic Medicine
300 Pasteur Drive, Lane Building, Stanford, CA 94305
Phone: +1.202.769.8090
Email: leomarti using stanford.edu; chopotin using gmail.com
Website <https://profiles.stanford.edu/leonardo-martinez-pantoja>

Leonardo Martinez, PhD, MPH
Stanford University School of Medicine
Division of Infectious Diseases and Geographic Medicine
300 Pasteur Drive, Lane Building, Stanford, CA 94305
Phone: +1.202.769.8090
Email: leomarti using stanford.edu; chopotin using gmail.com
Website <https://profiles.stanford.edu/leonardo-martinez-pantoja>

On Thu, Aug 8, 2019 at 2:44 PM Olivia Cords <ocords using alumni.stanford.edu>
wrote:

> Dear Michael,
> Thanks so much for directing me to that resource. This makes a lot of sense
> and Q1 and Q4 have been resolved.
>
> Best,
> Olivia
>
> On Thu, Aug 8, 2019 at 12:58 AM Michael Dewey <lists using dewey.myzen.co.uk>
> wrote:
>
> > Dear Olivia
> >
> > I cannot answer all your questions but I think the first section can be
> > addressed by studying
> >
> >
> http://www.metafor-project.org/doku.php/tips:comp_two_independent_estimates
> >
> > You need to read a fair way in to get the answer but I think the issue
> > is that the separate models allow for different tau^2 whereas the common
> > model has a single estimate. The link shows you how to alter that.
> >
> > I will leave the rest for others to have a go at.
> >
> > Michael
> >
> > On 07/08/2019 23:08, Olivia Cords wrote:
> > > Dear Wolfgang,
> > > Thanks so much again for your help. The comments you provided have been
> > > incredibly useful and I have implemented your suggestions. I just
> wanted
> > to
> > > follow up with you on this topic with a few additional clarifications
> > > regarding incident rates, incident rate ratios, confidence intervals,
> and
> > > similar issues for prevalence/odds ratios.
> > >
> > > *Incidence Rates*
> > > I calculated the incidence rates (measure = "IR", ti =
> > > data$person_years/1000) for tuberculosis stratified by World Health
> > > Organization Region and got the following results from the model.
> > >
> > > m0 <- rma.mv(yi, vi, method='REML', mods = formula,
> > >                          random= ~ 1 | study_id/cohort_id,
> > >                          tdist=TRUE,
> > >                          data=pd_ec)
> > >
> > >                                            estimate      se     tval
> > pval
> > > intrcpt                                     3.5210  0.9728   3.6195
> > 0.0003
> > > who_region.1African Region                 14.5115  2.6764   5.4219
> > <.0001
> > > who_region.1Eastern Mediterranean Region   -0.9006  3.8084  -0.2365
> > 0.8132
> > > who_region.1European Region                 5.2299  2.2226   2.3531
> > 0.0191
> > > who_region.1South-East Asian Region         7.2904  3.7296   1.9547
> > 0.0513
> > > who_region.1Western Pacific Region          2.1339  3.3186   0.6430
> > 0.5206
> > >                                              ci.lb    ci.ub
> > > intrcpt                                    1.6083   5.4337  ***
> > > who_region.1African Region                 9.2492  19.7738  ***
> > > who_region.1Eastern Mediterranean Region  -8.3885   6.5873
> > > who_region.1European Region                0.8600   9.5998    *
> > > who_region.1South-East Asian Region       -0.0426  14.6235    .
> > > who_region.1Western Pacific Region        -4.3910   8.6587
> > >
> > >    To find the incidence rates, I added the slopes to the intercept to
> > > produce the following Incident Rates/1000 person years:
> > > Intercpt (Americas) 3.521
> > > African Region 18.03249
> > > Eastern Mediterranean Region 2.620433
> > > European Region 8.750917
> > > South-East Asian Region 10.81145
> > > Western Pacific Region 5.654908
> > >
> > > To confirm these results I also ran the model on the dataset subsetted
> by
> > > region, using the 'estimate' as each region's incidence rate. This
> > resulted
> > > in the following:
> > >
> > > Region of the Americas
> > > 3.244745079
> > > African Region
> > > 19.06.383209
> > > Eastern Mediterranean Region 2.491278082
> > > European Region 7.675655135
> > > South-East Asian Region
> > > 11.48738471
> > > Western Pacific Region 5.600276079
> > >
> > > *Q1) Why are the Incidence rates different when using the entire
> dataset
> > > versus first subsetting by region? *
> > >
> > > *Rate Ratios:*
> > >
> > > Following your suggestion above for calculating rate ratios for each
> WHO
> > > Region, I used the measure = "IRLN" and exponentiated the coefficients
> of
> > > the model. I got the following model output:
> > >
> > > Multivariate Meta-Analysis Model (k = 390; method: REML)
> > >
> > > Variance Components:
> > >
> > >              estim    sqrt  nlvls  fixed              factor
> > > sigma^2.1  2.3898  1.5459     76     no            study_id
> > > sigma^2.2  0.5833  0.7637    390     no  study_id/cohort_id
> > >
> > > Test for Residual Heterogeneity:
> > > QE(df = 384) = 118469.9261, p-val < .0001
> > >
> > > Test of Moderators (coefficients 2:6):
> > > F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001
> > >
> > > Model Results:
> > >
> > >                                            estimate      se      tval
> > pval
> > > intrcpt                                    -7.1326  0.2449  -29.1259
> > <.0001
> > > who_region.1African Region                  3.1798  0.6332    5.0217
> > <.0001
> > > who_region.1Eastern Mediterranean Region    0.9541  0.9634    0.9904
> > 0.3226
> > > who_region.1European Region                 1.8665  0.5612    3.3261
> > 0.0010
> > > who_region.1South-East Asian Region         2.6693  0.9360    2.8518
> > 0.0046
> > > who_region.1Western Pacific Region          1.4697  0.8381    1.7536
> > 0.0803
> > >                                              ci.lb    ci.ub
> > > intrcpt                                   -7.6141  -6.6511  ***
> > > who_region.1African Region                 1.9348   4.4248  ***
> > > who_region.1Eastern Mediterranean Region  -0.9401   2.8483
> > > who_region.1European Region                0.7631   2.9698  ***
> > > who_region.1South-East Asian Region        0.8290   4.5096   **
> > > who_region.1Western Pacific Region        -0.1781   3.1176    .
> > >
> > >
> > > And exponentiating the coefficients, I got these rate ratios:
> > >
> > > Intrcpt (Americas) 0.000799
> > > African Region 24.04225
> > > Eastern Mediterranean Region 2.596419
> > > European Region 6.465383
> > > South-East Asian Region 14.42971
> > > Western Pacific Region 4.348128
> > >
> > > Based on the incidence rates produced by using the entire dataset in
> the
> > > model and first subsetting by region, these Rate Ratios don't seem to
> be
> > > correct. Simply dividing the incidence rates by a comparator (Region of
> > the
> > > Americas) to produce rate ratios would give the following:
> > > Entire Dataset First Subsetted
> > > Region of the Americas
> > > African Region 5.121411531 5.875294245
> > > Eastern Mediterranean Region 0.744106788 0.767788539
> > > European Region 2.485089463 2.365564921
> > > South-East Asian Region 3.070434536 3.540304225
> > > Western Pacific Region 1.604657768 1.725952561
> > >
> > >
> > > *Q2) Is exponentiating the coefficients (measure = IRLN) the way to
> > > calculate rate ratios? Why are these results so different?*
> > >
> > > *Confidence Intervals*
> > >
> > > To calculate the 95% confidence intervals for the rate ratios, I first
> > > calculated the standard deviation (SD[ln(IR)] = (1/A1 + 1/A2)^0.5,
> where
> > A1
> > > and A2 are the number of tuberculosis cases in each region), and then
> > used
> > > the following: 95% CI's = exp[ln(IR) ± 1.96(SD)]).
> > >
> > > *Q3) Is this the right approach to calculating the confidence
> intervals?
> > Is
> > > there a way to calculate the confidence intervals from the model
> (either
> > > measure = "IRLN" or measure = "IR") output?*
> > >
> > > *Prevalence:*
> > > Additionally, we have data for calculating tuberculosis prevalence rate
> > by
> > > WHO region and are trying to calculate odds ratios. For prevalence
> rate,
> > > all of the code for the model is the same, except the escalc function
> is
> > as
> > > follows:
> > >              pd_ec <- escalc(measure = 'PR', xi =
> > data_sub$prev_positive,ni
> > > = data_sub$prev_total_n, append = TRUE,
> > >              data = data_sub),
> > >
> > > *Q4) For prevalence rates, should we first subset the data to find each
> > > region's prevalence rate, or use the entire dataset (adding intercept
> to
> > > each region's slope)?*
> > > *Q5)How would we calculate odds ratios using the model?*
> > > *Q6)How would we calculate the confidence intervals for these odds
> ratios
> > > using the model? *
> > >
> > > Thank you so much again for your advice. It's really been indispensable
> > and
> > > is greatly appreciated.
> > >
> > > All the best,
> > > Olivia
> > >
> > > On Fri, Jul 26, 2019 at 1:44 PM Viechtbauer, Wolfgang (SP) <
> > > wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> > >
> > >> Hi Olivia,
> > >>
> > >> A couple comments before I get to your actual question:
> > >>
> > >> 1) You might want to divide the person-years by some factor. For
> > example:
> > >>
> > >> ti = data$person_years / 1000
> > >>
> > >> The incidence rates then reflect the number of cases per 1000 person
> > >> years. That way, the rates aren't so small and the same goes for the
> > model
> > >> coefficients (e.g., the 0.0027 will turn into 2.7, which is a bit
> > easier to
> > >> communicate).
> > >>
> > >> 2) You don't need to write pd_ec$yi and pd_ec$vi when you use
> > data=pd_ec.
> > >> So, rma.mv(yi, vi, ..., data=pd_ec) is sufficient.
> > >>
> > >> 3) You should include random effects for studies (as you have done)
> and
> > >> also random effects for estimates within studies. So:
> > >>
> > >> pd_ec$id <- 1:nrow(pd_ec)
> > >>
> > >> random = ~ 1 | study_id / id
> > >>
> > >> In fact, I see that you may have multiple estimates for the same
> cohort,
> > >> so one could even use an additional level, that is:
> > >>
> > >> random = ~ 1 | study_id / cohort_id / id
> > >>
> > >> But if you only have 8 rows of data, then I wouldn't do that (so then
> I
> > >> would use random = ~ 1 | study_id / id).
> > >>
> > >> 4) Now for your actual question: You are analyzing incidence rates, so
> > the
> > >> coefficients of your model reflect how the (average) rate changes as a
> > >> function of those moderators (e.g., while the rate is 0.0027 for
> > >> "1Americas", the rate is 0.0027-0.0025=0.0002 for "Africa"). In other
> > >> words, the coefficients represent incidence rate differences. If you
> > want
> > >> ratios, you should analyze log incidence rates (measure="IRLN"). Then
> > the
> > >> coefficients reflect differences between log incidence rates, so if
> you
> > >> exponentiate the coefficients of your model, you get incidence rate
> > ratios.
> > >>
> > >> Best,
> > >> Wolfgang
> > >>
> > >> -----Original Message-----
> > >> From: R-sig-meta-analysis [mailto:
> > >> r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Olivia Cords
> > >> Sent: Friday, 26 July, 2019 21:27
> > >> To: r-sig-meta-analysis using r-project.org
> > >> Subject: [R-meta] metafor package in R - Risk ratios using rma.mv()
> > >>
> > >> ATTACHMENT(S) REMOVED: metafor_output_0724 (1).png |
> > 702_tb_incidence.xlsx
> > >> | 702_metafor_tb_incidence (4).r
> > >>
> > >> Hello!
> > >> My name is Olivia and I'm a researcher at Stanford University. Our
> group
> > >> is trying to calculate relative risk ratios for the a meta-analysis
> > >> extracting  incidence rates of tuberculosis disease using the metafor
> > >> package.
> > >>
> > >> Study Design:
> > >> Incidence rates were extracted for each study identified by a
> systematic
> > >> review, with some studies reporting multiple rates for different years
> > or
> > >> locations. In this case, multiple rates were treated as different
> study
> > >> cohorts, meaning that the data is clustered ('cohort_id' nested within
> > >> 'study_id').
> > >>
> > >> Model:
> > >> We used the rma.mv() function, inputting the calculated incidence
> rate
> > as
> > >> the outcome variable ('pdc$yi), the variance ('pdc$vi'), WHO region
> > >> ('who_region') and whether the study was conducted through passive or
> > >> active screening (passive_active') as moderators, and a random effects
> > >> argument for the study level ('study_id'). We are unclear how to go
> from
> > >> the output to Risk Ratios.
> > >>
> > >> Data:
> > >> study_id    cohort_id   n_diagnosed person_years   who_region
> > >> passive_active
> > >> 131         34          77          14298          1Americas
>  1Passive
> > >> 93          120         5           27750          1Americas
>  1Passive
> > >> 93          121         14          277150         1Americas
>  1Passive
> > >> 93          122         15          2000           1Americas
>  1Passive
> > >> 136         383         2           2000           Africa      2Active
> > >> 136         383         7           100000         Africa      2Active
> > >> 187         16          16          100000         Africa      3Not
> > >> Specified
> > >> 187         517         2           100000         S.E. Asia   3Not
> > >> Specified
> > >>
> > >> Code:
> > >>
> > >> library(xlsx)
> > >> library(metafor)
> > >>
> > >> data <- read.xlsx('702_tb_incidence.xlsx', 1)
> > >> print(data)
> > >> #calculating the incidence rate and the variance
> > >> pd_ec <- escalc(
> > >>     measure = 'IR',
> > >>     xi = data$n_diagnosed,
> > >>     ti = data$person_years,
> > >>     data = data
> > >> )
> > >> pd_ec
> > >>
> > >> #specifying mixed effects model
> > >> #first level cohort incidence rate and variation
> > >> #second level study_id
> > >> #who_region and passive_active as moderators
> > >> m0 <- rma.mv(pd_ec$yi, pd_ec$vi, method='REML', mods = ~ who_region +
> > >> passive_active,
> > >>          random= ~ 1 | study_id,
> > >>          tdist=TRUE,
> > >>          data=pd_ec)
> > >>
> > >> summary(m0)
> > >>
> > >> Output:
> > >> Multivariate Meta-Analysis Model (k = 8; method: REML)
> > >>
> > >>    logLik  Deviance       AIC       BIC      AICc
> > >>   15.4961  -30.9922  -20.9922  -24.0607   39.0078
> > >>
> > >> Variance Components:
> > >>
> > >>              estim    sqrt  nlvls  fixed    factor
> > >> sigma^2    0.0000  0.0037      4     no  study_id
> > >>
> > >> Test for Residual Heterogeneity:
> > >> QE(df = 4) = 94.4465, p-val < .0001
> > >>
> > >> Test of Moderators (coefficients 2:4):
> > >> F(df1 = 3, df2 = 4) = 3.7931, p-val = 0.1152
> > >>
> > >> Model Results:
> > >>
> > >>                         estimate      se     tval    pval    ci.lb
> >  ci.ub
> > >> intrcpt                  0.0027  0.0027   1.0073  0.3708  -0.0047
> > >> 0.0101
> > >> who_regionAfrica        -0.0025  0.0046  -0.5493  0.6120  -0.0153
> > >> 0.0102
> > >> who_regionS.E. Asia     -0.0027  0.0046  -0.5797  0.5932  -0.0154
> > >> 0.0101
> > >> passive_active2Active   -0.0001  0.0053  -0.0167  0.9874  -0.0148
> > 0.0146
> > >>
> > >> Any advice/insight in much appreciated!
> > >>
> > >> Best wishes,
> > >> Olivia
> > >>
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > R-sig-meta-analysis mailing list
> > > R-sig-meta-analysis using r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
> > >
> > > ---
> > > This email has been checked for viruses by AVG.
> > > https://www.avg.com
> > >
> > >
> >
> > --
> > Michael
> > http://www.dewey.myzen.co.uk/home.html
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-meta-analysis mailing list
> R-sig-meta-analysis using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>

	[[alternative HTML version deleted]]