[R-meta] metafor package in R - Risk ratios using rma.mv()

Michael Dewey ||@t@ @end|ng |rom dewey@myzen@co@uk
Thu Aug 8 09:58:28 CEST 2019


Dear Olivia

I cannot answer all your questions but I think the first section can be 
addressed by studying

http://www.metafor-project.org/doku.php/tips:comp_two_independent_estimates

You need to read a fair way in to get the answer but I think the issue 
is that the separate models allow for different tau^2 whereas the common 
model has a single estimate. The link shows you how to alter that.

I will leave the rest for others to have a go at.

Michael

On 07/08/2019 23:08, Olivia Cords wrote:
> Dear Wolfgang,
> Thanks so much again for your help. The comments you provided have been
> incredibly useful and I have implemented your suggestions. I just wanted to
> follow up with you on this topic with a few additional clarifications
> regarding incident rates, incident rate ratios, confidence intervals, and
> similar issues for prevalence/odds ratios.
> 
> *Incidence Rates*
> I calculated the incidence rates (measure = "IR", ti =
> data$person_years/1000) for tuberculosis stratified by World Health
> Organization Region and got the following results from the model.
> 
> m0 <- rma.mv(yi, vi, method='REML', mods = formula,
>                          random= ~ 1 | study_id/cohort_id,
>                          tdist=TRUE,
>                          data=pd_ec)
> 
>                                            estimate      se     tval    pval
> intrcpt                                     3.5210  0.9728   3.6195  0.0003
> who_region.1African Region                 14.5115  2.6764   5.4219  <.0001
> who_region.1Eastern Mediterranean Region   -0.9006  3.8084  -0.2365  0.8132
> who_region.1European Region                 5.2299  2.2226   2.3531  0.0191
> who_region.1South-East Asian Region         7.2904  3.7296   1.9547  0.0513
> who_region.1Western Pacific Region          2.1339  3.3186   0.6430  0.5206
>                                              ci.lb    ci.ub
> intrcpt                                    1.6083   5.4337  ***
> who_region.1African Region                 9.2492  19.7738  ***
> who_region.1Eastern Mediterranean Region  -8.3885   6.5873
> who_region.1European Region                0.8600   9.5998    *
> who_region.1South-East Asian Region       -0.0426  14.6235    .
> who_region.1Western Pacific Region        -4.3910   8.6587
> 
>    To find the incidence rates, I added the slopes to the intercept to
> produce the following Incident Rates/1000 person years:
> Intercpt (Americas) 3.521
> African Region 18.03249
> Eastern Mediterranean Region 2.620433
> European Region 8.750917
> South-East Asian Region 10.81145
> Western Pacific Region 5.654908
> 
> To confirm these results I also ran the model on the dataset subsetted by
> region, using the 'estimate' as each region's incidence rate. This resulted
> in the following:
> 
> Region of the Americas
> 3.244745079
> African Region
> 19.06.383209
> Eastern Mediterranean Region 2.491278082
> European Region 7.675655135
> South-East Asian Region
> 11.48738471
> Western Pacific Region 5.600276079
> 
> *Q1) Why are the Incidence rates different when using the entire dataset
> versus first subsetting by region? *
> 
> *Rate Ratios:*
> 
> Following your suggestion above for calculating rate ratios for each WHO
> Region, I used the measure = "IRLN" and exponentiated the coefficients of
> the model. I got the following model output:
> 
> Multivariate Meta-Analysis Model (k = 390; method: REML)
> 
> Variance Components:
> 
>              estim    sqrt  nlvls  fixed              factor
> sigma^2.1  2.3898  1.5459     76     no            study_id
> sigma^2.2  0.5833  0.7637    390     no  study_id/cohort_id
> 
> Test for Residual Heterogeneity:
> QE(df = 384) = 118469.9261, p-val < .0001
> 
> Test of Moderators (coefficients 2:6):
> F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001
> 
> Model Results:
> 
>                                            estimate      se      tval    pval
> intrcpt                                    -7.1326  0.2449  -29.1259  <.0001
> who_region.1African Region                  3.1798  0.6332    5.0217  <.0001
> who_region.1Eastern Mediterranean Region    0.9541  0.9634    0.9904  0.3226
> who_region.1European Region                 1.8665  0.5612    3.3261  0.0010
> who_region.1South-East Asian Region         2.6693  0.9360    2.8518  0.0046
> who_region.1Western Pacific Region          1.4697  0.8381    1.7536  0.0803
>                                              ci.lb    ci.ub
> intrcpt                                   -7.6141  -6.6511  ***
> who_region.1African Region                 1.9348   4.4248  ***
> who_region.1Eastern Mediterranean Region  -0.9401   2.8483
> who_region.1European Region                0.7631   2.9698  ***
> who_region.1South-East Asian Region        0.8290   4.5096   **
> who_region.1Western Pacific Region        -0.1781   3.1176    .
> 
> 
> And exponentiating the coefficients, I got these rate ratios:
> 
> Intrcpt (Americas) 0.000799
> African Region 24.04225
> Eastern Mediterranean Region 2.596419
> European Region 6.465383
> South-East Asian Region 14.42971
> Western Pacific Region 4.348128
> 
> Based on the incidence rates produced by using the entire dataset in the
> model and first subsetting by region, these Rate Ratios don't seem to be
> correct. Simply dividing the incidence rates by a comparator (Region of the
> Americas) to produce rate ratios would give the following:
> Entire Dataset First Subsetted
> Region of the Americas
> African Region 5.121411531 5.875294245
> Eastern Mediterranean Region 0.744106788 0.767788539
> European Region 2.485089463 2.365564921
> South-East Asian Region 3.070434536 3.540304225
> Western Pacific Region 1.604657768 1.725952561
> 
> 
> *Q2) Is exponentiating the coefficients (measure = IRLN) the way to
> calculate rate ratios? Why are these results so different?*
> 
> *Confidence Intervals*
> 
> To calculate the 95% confidence intervals for the rate ratios, I first
> calculated the standard deviation (SD[ln(IR)] = (1/A1 + 1/A2)^0.5, where A1
> and A2 are the number of tuberculosis cases in each region), and then used
> the following: 95% CI's = exp[ln(IR) ± 1.96(SD)]).
> 
> *Q3) Is this the right approach to calculating the confidence intervals? Is
> there a way to calculate the confidence intervals from the model (either
> measure = "IRLN" or measure = "IR") output?*
> 
> *Prevalence:*
> Additionally, we have data for calculating tuberculosis prevalence rate by
> WHO region and are trying to calculate odds ratios. For prevalence rate,
> all of the code for the model is the same, except the escalc function is as
> follows:
>              pd_ec <- escalc(measure = 'PR', xi = data_sub$prev_positive,ni
> = data_sub$prev_total_n, append = TRUE,
>              data = data_sub),
> 
> *Q4) For prevalence rates, should we first subset the data to find each
> region's prevalence rate, or use the entire dataset (adding intercept to
> each region's slope)?*
> *Q5)How would we calculate odds ratios using the model?*
> *Q6)How would we calculate the confidence intervals for these odds ratios
> using the model? *
> 
> Thank you so much again for your advice. It's really been indispensable and
> is greatly appreciated.
> 
> All the best,
> Olivia
> 
> On Fri, Jul 26, 2019 at 1:44 PM Viechtbauer, Wolfgang (SP) <
> wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
> 
>> Hi Olivia,
>>
>> A couple comments before I get to your actual question:
>>
>> 1) You might want to divide the person-years by some factor. For example:
>>
>> ti = data$person_years / 1000
>>
>> The incidence rates then reflect the number of cases per 1000 person
>> years. That way, the rates aren't so small and the same goes for the model
>> coefficients (e.g., the 0.0027 will turn into 2.7, which is a bit easier to
>> communicate).
>>
>> 2) You don't need to write pd_ec$yi and pd_ec$vi when you use data=pd_ec.
>> So, rma.mv(yi, vi, ..., data=pd_ec) is sufficient.
>>
>> 3) You should include random effects for studies (as you have done) and
>> also random effects for estimates within studies. So:
>>
>> pd_ec$id <- 1:nrow(pd_ec)
>>
>> random = ~ 1 | study_id / id
>>
>> In fact, I see that you may have multiple estimates for the same cohort,
>> so one could even use an additional level, that is:
>>
>> random = ~ 1 | study_id / cohort_id / id
>>
>> But if you only have 8 rows of data, then I wouldn't do that (so then I
>> would use random = ~ 1 | study_id / id).
>>
>> 4) Now for your actual question: You are analyzing incidence rates, so the
>> coefficients of your model reflect how the (average) rate changes as a
>> function of those moderators (e.g., while the rate is 0.0027 for
>> "1Americas", the rate is 0.0027-0.0025=0.0002 for "Africa"). In other
>> words, the coefficients represent incidence rate differences. If you want
>> ratios, you should analyze log incidence rates (measure="IRLN"). Then the
>> coefficients reflect differences between log incidence rates, so if you
>> exponentiate the coefficients of your model, you get incidence rate ratios.
>>
>> Best,
>> Wolfgang
>>
>> -----Original Message-----
>> From: R-sig-meta-analysis [mailto:
>> r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Olivia Cords
>> Sent: Friday, 26 July, 2019 21:27
>> To: r-sig-meta-analysis using r-project.org
>> Subject: [R-meta] metafor package in R - Risk ratios using rma.mv()
>>
>> ATTACHMENT(S) REMOVED: metafor_output_0724 (1).png | 702_tb_incidence.xlsx
>> | 702_metafor_tb_incidence (4).r
>>
>> Hello!
>> My name is Olivia and I'm a researcher at Stanford University. Our group
>> is trying to calculate relative risk ratios for the a meta-analysis
>> extracting  incidence rates of tuberculosis disease using the metafor
>> package.
>>
>> Study Design:
>> Incidence rates were extracted for each study identified by a systematic
>> review, with some studies reporting multiple rates for different years or
>> locations. In this case, multiple rates were treated as different study
>> cohorts, meaning that the data is clustered ('cohort_id' nested within
>> 'study_id').
>>
>> Model:
>> We used the rma.mv() function, inputting the calculated incidence rate as
>> the outcome variable ('pdc$yi), the variance ('pdc$vi'), WHO region
>> ('who_region') and whether the study was conducted through passive or
>> active screening (passive_active') as moderators, and a random effects
>> argument for the study level ('study_id'). We are unclear how to go from
>> the output to Risk Ratios.
>>
>> Data:
>> study_id    cohort_id   n_diagnosed person_years   who_region
>> passive_active
>> 131         34          77          14298          1Americas   1Passive
>> 93          120         5           27750          1Americas   1Passive
>> 93          121         14          277150         1Americas   1Passive
>> 93          122         15          2000           1Americas   1Passive
>> 136         383         2           2000           Africa      2Active
>> 136         383         7           100000         Africa      2Active
>> 187         16          16          100000         Africa      3Not
>> Specified
>> 187         517         2           100000         S.E. Asia   3Not
>> Specified
>>
>> Code:
>>
>> library(xlsx)
>> library(metafor)
>>
>> data <- read.xlsx('702_tb_incidence.xlsx', 1)
>> print(data)
>> #calculating the incidence rate and the variance
>> pd_ec <- escalc(
>>     measure = 'IR',
>>     xi = data$n_diagnosed,
>>     ti = data$person_years,
>>     data = data
>> )
>> pd_ec
>>
>> #specifying mixed effects model
>> #first level cohort incidence rate and variation
>> #second level study_id
>> #who_region and passive_active as moderators
>> m0 <- rma.mv(pd_ec$yi, pd_ec$vi, method='REML', mods = ~ who_region +
>> passive_active,
>>          random= ~ 1 | study_id,
>>          tdist=TRUE,
>>          data=pd_ec)
>>
>> summary(m0)
>>
>> Output:
>> Multivariate Meta-Analysis Model (k = 8; method: REML)
>>
>>    logLik  Deviance       AIC       BIC      AICc
>>   15.4961  -30.9922  -20.9922  -24.0607   39.0078
>>
>> Variance Components:
>>
>>              estim    sqrt  nlvls  fixed    factor
>> sigma^2    0.0000  0.0037      4     no  study_id
>>
>> Test for Residual Heterogeneity:
>> QE(df = 4) = 94.4465, p-val < .0001
>>
>> Test of Moderators (coefficients 2:4):
>> F(df1 = 3, df2 = 4) = 3.7931, p-val = 0.1152
>>
>> Model Results:
>>
>>                         estimate      se     tval    pval    ci.lb   ci.ub
>> intrcpt                  0.0027  0.0027   1.0073  0.3708  -0.0047
>> 0.0101
>> who_regionAfrica        -0.0025  0.0046  -0.5493  0.6120  -0.0153
>> 0.0102
>> who_regionS.E. Asia     -0.0027  0.0046  -0.5797  0.5932  -0.0154
>> 0.0101
>> passive_active2Active   -0.0001  0.0053  -0.0167  0.9874  -0.0148  0.0146
>>
>> Any advice/insight in much appreciated!
>>
>> Best wishes,
>> Olivia
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-meta-analysis mailing list
> R-sig-meta-analysis using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
> 
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
> 
> 

-- 
Michael
http://www.dewey.myzen.co.uk/home.html



More information about the R-sig-meta-analysis mailing list