[R-meta] metafor package in R - Risk ratios using rma.mv()

Thu Aug 8 23:44:51 CEST 2019

Dear Michael,
Thanks so much for directing me to that resource. This makes a lot of sense
and Q1 and Q4 have been resolved.

Best,
Olivia

On Thu, Aug 8, 2019 at 12:58 AM Michael Dewey <lists using dewey.myzen.co.uk>
wrote:

> Dear Olivia
>
> I cannot answer all your questions but I think the first section can be
> addressed by studying
>
> http://www.metafor-project.org/doku.php/tips:comp_two_independent_estimates
>
> You need to read a fair way in to get the answer but I think the issue
> is that the separate models allow for different tau^2 whereas the common
> model has a single estimate. The link shows you how to alter that.
>
> I will leave the rest for others to have a go at.
>
> Michael
>
> On 07/08/2019 23:08, Olivia Cords wrote:
> > Dear Wolfgang,
> > Thanks so much again for your help. The comments you provided have been
> > incredibly useful and I have implemented your suggestions. I just wanted
> to
> > follow up with you on this topic with a few additional clarifications
> > regarding incident rates, incident rate ratios, confidence intervals, and
> > similar issues for prevalence/odds ratios.
> >
> > *Incidence Rates*
> > I calculated the incidence rates (measure = "IR", ti =
> > data$person_years/1000) for tuberculosis stratified by World Health
> > Organization Region and got the following results from the model.
> >
> > m0 <- rma.mv(yi, vi, method='REML', mods = formula,
> >                          random= ~ 1 | study_id/cohort_id,
> >                          tdist=TRUE,
> >                          data=pd_ec)
> >
> >                                            estimate      se     tval
> pval
> > intrcpt                                     3.5210  0.9728   3.6195
> 0.0003
> > who_region.1African Region                 14.5115  2.6764   5.4219
> <.0001
> > who_region.1Eastern Mediterranean Region   -0.9006  3.8084  -0.2365
> 0.8132
> > who_region.1European Region                 5.2299  2.2226   2.3531
> 0.0191
> > who_region.1South-East Asian Region         7.2904  3.7296   1.9547
> 0.0513
> > who_region.1Western Pacific Region          2.1339  3.3186   0.6430
> 0.5206
> >                                              ci.lb    ci.ub
> > intrcpt                                    1.6083   5.4337  ***
> > who_region.1African Region                 9.2492  19.7738  ***
> > who_region.1Eastern Mediterranean Region  -8.3885   6.5873
> > who_region.1European Region                0.8600   9.5998    *
> > who_region.1South-East Asian Region       -0.0426  14.6235    .
> > who_region.1Western Pacific Region        -4.3910   8.6587
> >
> >    To find the incidence rates, I added the slopes to the intercept to
> > produce the following Incident Rates/1000 person years:
> > Intercpt (Americas) 3.521
> > African Region 18.03249
> > Eastern Mediterranean Region 2.620433
> > European Region 8.750917
> > South-East Asian Region 10.81145
> > Western Pacific Region 5.654908
> >
> > To confirm these results I also ran the model on the dataset subsetted by
> > region, using the 'estimate' as each region's incidence rate. This
> resulted
> > in the following:
> >
> > Region of the Americas
> > 3.244745079
> > African Region
> > 19.06.383209
> > Eastern Mediterranean Region 2.491278082
> > European Region 7.675655135
> > South-East Asian Region
> > 11.48738471
> > Western Pacific Region 5.600276079
> >
> > *Q1) Why are the Incidence rates different when using the entire dataset
> > versus first subsetting by region? *
> >
> > *Rate Ratios:*
> >
> > Following your suggestion above for calculating rate ratios for each WHO
> > Region, I used the measure = "IRLN" and exponentiated the coefficients of
> > the model. I got the following model output:
> >
> > Multivariate Meta-Analysis Model (k = 390; method: REML)
> >
> > Variance Components:
> >
> >              estim    sqrt  nlvls  fixed              factor
> > sigma^2.1  2.3898  1.5459     76     no            study_id
> > sigma^2.2  0.5833  0.7637    390     no  study_id/cohort_id
> >
> > Test for Residual Heterogeneity:
> > QE(df = 384) = 118469.9261, p-val < .0001
> >
> > Test of Moderators (coefficients 2:6):
> > F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001
> >
> > Model Results:
> >
> >                                            estimate      se      tval
> pval
> > intrcpt                                    -7.1326  0.2449  -29.1259
> <.0001
> > who_region.1African Region                  3.1798  0.6332    5.0217
> <.0001
> > who_region.1Eastern Mediterranean Region    0.9541  0.9634    0.9904
> 0.3226
> > who_region.1European Region                 1.8665  0.5612    3.3261
> 0.0010
> > who_region.1South-East Asian Region         2.6693  0.9360    2.8518
> 0.0046
> > who_region.1Western Pacific Region          1.4697  0.8381    1.7536
> 0.0803
> >                                              ci.lb    ci.ub
> > intrcpt                                   -7.6141  -6.6511  ***
> > who_region.1African Region                 1.9348   4.4248  ***
> > who_region.1Eastern Mediterranean Region  -0.9401   2.8483
> > who_region.1European Region                0.7631   2.9698  ***
> > who_region.1South-East Asian Region        0.8290   4.5096   **
> > who_region.1Western Pacific Region        -0.1781   3.1176    .
> >
> >
> > And exponentiating the coefficients, I got these rate ratios:
> >
> > Intrcpt (Americas) 0.000799
> > African Region 24.04225
> > Eastern Mediterranean Region 2.596419
> > European Region 6.465383
> > South-East Asian Region 14.42971
> > Western Pacific Region 4.348128
> >
> > Based on the incidence rates produced by using the entire dataset in the
> > model and first subsetting by region, these Rate Ratios don't seem to be
> > correct. Simply dividing the incidence rates by a comparator (Region of
> the
> > Americas) to produce rate ratios would give the following:
> > Entire Dataset First Subsetted
> > Region of the Americas
> > African Region 5.121411531 5.875294245
> > Eastern Mediterranean Region 0.744106788 0.767788539
> > European Region 2.485089463 2.365564921
> > South-East Asian Region 3.070434536 3.540304225
> > Western Pacific Region 1.604657768 1.725952561
> >
> >
> > *Q2) Is exponentiating the coefficients (measure = IRLN) the way to
> > calculate rate ratios? Why are these results so different?*
> >
> > *Confidence Intervals*
> >
> > To calculate the 95% confidence intervals for the rate ratios, I first
> > calculated the standard deviation (SD[ln(IR)] = (1/A1 + 1/A2)^0.5, where
> A1
> > and A2 are the number of tuberculosis cases in each region), and then
> used
> > the following: 95% CI's = exp[ln(IR) ± 1.96(SD)]).
> >
> > *Q3) Is this the right approach to calculating the confidence intervals?
> Is
> > there a way to calculate the confidence intervals from the model (either
> > measure = "IRLN" or measure = "IR") output?*
> >
> > *Prevalence:*
> > Additionally, we have data for calculating tuberculosis prevalence rate
> by
> > WHO region and are trying to calculate odds ratios. For prevalence rate,
> > all of the code for the model is the same, except the escalc function is
> as
> > follows:
> >              pd_ec <- escalc(measure = 'PR', xi =
> data_sub$prev_positive,ni
> > = data_sub$prev_total_n, append = TRUE,
> >              data = data_sub),
> >
> > *Q4) For prevalence rates, should we first subset the data to find each
> > region's prevalence rate, or use the entire dataset (adding intercept to
> > each region's slope)?*
> > *Q5)How would we calculate odds ratios using the model?*
> > *Q6)How would we calculate the confidence intervals for these odds ratios
> > using the model? *
> >
> > Thank you so much again for your advice. It's really been indispensable
> and
> > is greatly appreciated.
> >
> > All the best,
> > Olivia
> >
> > On Fri, Jul 26, 2019 at 1:44 PM Viechtbauer, Wolfgang (SP) <
> > wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> >
> >> Hi Olivia,
> >>
> >> A couple comments before I get to your actual question:
> >>
> >> 1) You might want to divide the person-years by some factor. For
> example:
> >>
> >> ti = data$person_years / 1000
> >>
> >> The incidence rates then reflect the number of cases per 1000 person
> >> years. That way, the rates aren't so small and the same goes for the
> model
> >> coefficients (e.g., the 0.0027 will turn into 2.7, which is a bit
> easier to
> >> communicate).
> >>
> >> 2) You don't need to write pd_ec$yi and pd_ec$vi when you use
> data=pd_ec.
> >> So, rma.mv(yi, vi, ..., data=pd_ec) is sufficient.
> >>
> >> 3) You should include random effects for studies (as you have done) and
> >> also random effects for estimates within studies. So:
> >>
> >> pd_ec$id <- 1:nrow(pd_ec)
> >>
> >> random = ~ 1 | study_id / id
> >>
> >> In fact, I see that you may have multiple estimates for the same cohort,
> >> so one could even use an additional level, that is:
> >>
> >> random = ~ 1 | study_id / cohort_id / id
> >>
> >> But if you only have 8 rows of data, then I wouldn't do that (so then I
> >> would use random = ~ 1 | study_id / id).
> >>
> >> 4) Now for your actual question: You are analyzing incidence rates, so
> the
> >> coefficients of your model reflect how the (average) rate changes as a
> >> function of those moderators (e.g., while the rate is 0.0027 for
> >> "1Americas", the rate is 0.0027-0.0025=0.0002 for "Africa"). In other
> >> words, the coefficients represent incidence rate differences. If you
> want
> >> ratios, you should analyze log incidence rates (measure="IRLN"). Then
> the
> >> coefficients reflect differences between log incidence rates, so if you
> >> exponentiate the coefficients of your model, you get incidence rate
> ratios.
> >>
> >> Best,
> >> Wolfgang
> >>
> >> -----Original Message-----
> >> From: R-sig-meta-analysis [mailto:
> >> r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Olivia Cords
> >> Sent: Friday, 26 July, 2019 21:27
> >> To: r-sig-meta-analysis using r-project.org
> >> Subject: [R-meta] metafor package in R - Risk ratios using rma.mv()
> >>
> >> ATTACHMENT(S) REMOVED: metafor_output_0724 (1).png |
> 702_tb_incidence.xlsx
> >> | 702_metafor_tb_incidence (4).r
> >>
> >> Hello!
> >> My name is Olivia and I'm a researcher at Stanford University. Our group
> >> is trying to calculate relative risk ratios for the a meta-analysis
> >> extracting  incidence rates of tuberculosis disease using the metafor
> >> package.
> >>
> >> Study Design:
> >> Incidence rates were extracted for each study identified by a systematic
> >> review, with some studies reporting multiple rates for different years
> or
> >> locations. In this case, multiple rates were treated as different study
> >> cohorts, meaning that the data is clustered ('cohort_id' nested within
> >> 'study_id').
> >>
> >> Model:
> >> We used the rma.mv() function, inputting the calculated incidence rate
> as
> >> the outcome variable ('pdc$yi), the variance ('pdc$vi'), WHO region
> >> ('who_region') and whether the study was conducted through passive or
> >> active screening (passive_active') as moderators, and a random effects
> >> argument for the study level ('study_id'). We are unclear how to go from
> >> the output to Risk Ratios.
> >>
> >> Data:
> >> study_id    cohort_id   n_diagnosed person_years   who_region
> >> passive_active
> >> 131         34          77          14298          1Americas   1Passive
> >> 93          120         5           27750          1Americas   1Passive
> >> 93          121         14          277150         1Americas   1Passive
> >> 93          122         15          2000           1Americas   1Passive
> >> 136         383         2           2000           Africa      2Active
> >> 136         383         7           100000         Africa      2Active
> >> 187         16          16          100000         Africa      3Not
> >> Specified
> >> 187         517         2           100000         S.E. Asia   3Not
> >> Specified
> >>
> >> Code:
> >>
> >> library(xlsx)
> >> library(metafor)
> >>
> >> data <- read.xlsx('702_tb_incidence.xlsx', 1)
> >> print(data)
> >> #calculating the incidence rate and the variance
> >> pd_ec <- escalc(
> >>     measure = 'IR',
> >>     xi = data$n_diagnosed,
> >>     ti = data$person_years,
> >>     data = data
> >> )
> >> pd_ec
> >>
> >> #specifying mixed effects model
> >> #first level cohort incidence rate and variation
> >> #second level study_id
> >> #who_region and passive_active as moderators
> >> m0 <- rma.mv(pd_ec$yi, pd_ec$vi, method='REML', mods = ~ who_region +
> >> passive_active,
> >>          random= ~ 1 | study_id,
> >>          tdist=TRUE,
> >>          data=pd_ec)
> >>
> >> summary(m0)
> >>
> >> Output:
> >> Multivariate Meta-Analysis Model (k = 8; method: REML)
> >>
> >>    logLik  Deviance       AIC       BIC      AICc
> >>   15.4961  -30.9922  -20.9922  -24.0607   39.0078
> >>
> >> Variance Components:
> >>
> >>              estim    sqrt  nlvls  fixed    factor
> >> sigma^2    0.0000  0.0037      4     no  study_id
> >>
> >> Test for Residual Heterogeneity:
> >> QE(df = 4) = 94.4465, p-val < .0001
> >>
> >> Test of Moderators (coefficients 2:4):
> >> F(df1 = 3, df2 = 4) = 3.7931, p-val = 0.1152
> >>
> >> Model Results:
> >>
> >>                         estimate      se     tval    pval    ci.lb
>  ci.ub
> >> intrcpt                  0.0027  0.0027   1.0073  0.3708  -0.0047
> >> 0.0101
> >> who_regionAfrica        -0.0025  0.0046  -0.5493  0.6120  -0.0153
> >> 0.0102
> >> who_regionS.E. Asia     -0.0027  0.0046  -0.5797  0.5932  -0.0154
> >> 0.0101
> >> passive_active2Active   -0.0001  0.0053  -0.0167  0.9874  -0.0148
> 0.0146
> >>
> >> Any advice/insight in much appreciated!
> >>
> >> Best wishes,
> >> Olivia
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-meta-analysis mailing list
> > R-sig-meta-analysis using r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
> >
> > ---
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
>

	[[alternative HTML version deleted]]