[R-meta] metafor package in R - Risk ratios using rma.mv()
Michael Dewey
||@t@ @end|ng |rom dewey@myzen@co@uk
Thu Aug 8 09:58:28 CEST 2019
Dear Olivia
I cannot answer all your questions but I think the first section can be
addressed by studying
http://www.metafor-project.org/doku.php/tips:comp_two_independent_estimates
You need to read a fair way in to get the answer but I think the issue
is that the separate models allow for different tau^2 whereas the common
model has a single estimate. The link shows you how to alter that.
I will leave the rest for others to have a go at.
Michael
On 07/08/2019 23:08, Olivia Cords wrote:
> Dear Wolfgang,
> Thanks so much again for your help. The comments you provided have been
> incredibly useful and I have implemented your suggestions. I just wanted to
> follow up with you on this topic with a few additional clarifications
> regarding incident rates, incident rate ratios, confidence intervals, and
> similar issues for prevalence/odds ratios.
>
> *Incidence Rates*
> I calculated the incidence rates (measure = "IR", ti =
> data$person_years/1000) for tuberculosis stratified by World Health
> Organization Region and got the following results from the model.
>
> m0 <- rma.mv(yi, vi, method='REML', mods = formula,
> random= ~ 1 | study_id/cohort_id,
> tdist=TRUE,
> data=pd_ec)
>
> estimate se tval pval
> intrcpt 3.5210 0.9728 3.6195 0.0003
> who_region.1African Region 14.5115 2.6764 5.4219 <.0001
> who_region.1Eastern Mediterranean Region -0.9006 3.8084 -0.2365 0.8132
> who_region.1European Region 5.2299 2.2226 2.3531 0.0191
> who_region.1South-East Asian Region 7.2904 3.7296 1.9547 0.0513
> who_region.1Western Pacific Region 2.1339 3.3186 0.6430 0.5206
> ci.lb ci.ub
> intrcpt 1.6083 5.4337 ***
> who_region.1African Region 9.2492 19.7738 ***
> who_region.1Eastern Mediterranean Region -8.3885 6.5873
> who_region.1European Region 0.8600 9.5998 *
> who_region.1South-East Asian Region -0.0426 14.6235 .
> who_region.1Western Pacific Region -4.3910 8.6587
>
> To find the incidence rates, I added the slopes to the intercept to
> produce the following Incident Rates/1000 person years:
> Intercpt (Americas) 3.521
> African Region 18.03249
> Eastern Mediterranean Region 2.620433
> European Region 8.750917
> South-East Asian Region 10.81145
> Western Pacific Region 5.654908
>
> To confirm these results I also ran the model on the dataset subsetted by
> region, using the 'estimate' as each region's incidence rate. This resulted
> in the following:
>
> Region of the Americas
> 3.244745079
> African Region
> 19.06.383209
> Eastern Mediterranean Region 2.491278082
> European Region 7.675655135
> South-East Asian Region
> 11.48738471
> Western Pacific Region 5.600276079
>
> *Q1) Why are the Incidence rates different when using the entire dataset
> versus first subsetting by region? *
>
> *Rate Ratios:*
>
> Following your suggestion above for calculating rate ratios for each WHO
> Region, I used the measure = "IRLN" and exponentiated the coefficients of
> the model. I got the following model output:
>
> Multivariate Meta-Analysis Model (k = 390; method: REML)
>
> Variance Components:
>
> estim sqrt nlvls fixed factor
> sigma^2.1 2.3898 1.5459 76 no study_id
> sigma^2.2 0.5833 0.7637 390 no study_id/cohort_id
>
> Test for Residual Heterogeneity:
> QE(df = 384) = 118469.9261, p-val < .0001
>
> Test of Moderators (coefficients 2:6):
> F(df1 = 5, df2 = 384) = 7.3370, p-val < .0001
>
> Model Results:
>
> estimate se tval pval
> intrcpt -7.1326 0.2449 -29.1259 <.0001
> who_region.1African Region 3.1798 0.6332 5.0217 <.0001
> who_region.1Eastern Mediterranean Region 0.9541 0.9634 0.9904 0.3226
> who_region.1European Region 1.8665 0.5612 3.3261 0.0010
> who_region.1South-East Asian Region 2.6693 0.9360 2.8518 0.0046
> who_region.1Western Pacific Region 1.4697 0.8381 1.7536 0.0803
> ci.lb ci.ub
> intrcpt -7.6141 -6.6511 ***
> who_region.1African Region 1.9348 4.4248 ***
> who_region.1Eastern Mediterranean Region -0.9401 2.8483
> who_region.1European Region 0.7631 2.9698 ***
> who_region.1South-East Asian Region 0.8290 4.5096 **
> who_region.1Western Pacific Region -0.1781 3.1176 .
>
>
> And exponentiating the coefficients, I got these rate ratios:
>
> Intrcpt (Americas) 0.000799
> African Region 24.04225
> Eastern Mediterranean Region 2.596419
> European Region 6.465383
> South-East Asian Region 14.42971
> Western Pacific Region 4.348128
>
> Based on the incidence rates produced by using the entire dataset in the
> model and first subsetting by region, these Rate Ratios don't seem to be
> correct. Simply dividing the incidence rates by a comparator (Region of the
> Americas) to produce rate ratios would give the following:
> Entire Dataset First Subsetted
> Region of the Americas
> African Region 5.121411531 5.875294245
> Eastern Mediterranean Region 0.744106788 0.767788539
> European Region 2.485089463 2.365564921
> South-East Asian Region 3.070434536 3.540304225
> Western Pacific Region 1.604657768 1.725952561
>
>
> *Q2) Is exponentiating the coefficients (measure = IRLN) the way to
> calculate rate ratios? Why are these results so different?*
>
> *Confidence Intervals*
>
> To calculate the 95% confidence intervals for the rate ratios, I first
> calculated the standard deviation (SD[ln(IR)] = (1/A1 + 1/A2)^0.5, where A1
> and A2 are the number of tuberculosis cases in each region), and then used
> the following: 95% CI's = exp[ln(IR) ± 1.96(SD)]).
>
> *Q3) Is this the right approach to calculating the confidence intervals? Is
> there a way to calculate the confidence intervals from the model (either
> measure = "IRLN" or measure = "IR") output?*
>
> *Prevalence:*
> Additionally, we have data for calculating tuberculosis prevalence rate by
> WHO region and are trying to calculate odds ratios. For prevalence rate,
> all of the code for the model is the same, except the escalc function is as
> follows:
> pd_ec <- escalc(measure = 'PR', xi = data_sub$prev_positive,ni
> = data_sub$prev_total_n, append = TRUE,
> data = data_sub),
>
> *Q4) For prevalence rates, should we first subset the data to find each
> region's prevalence rate, or use the entire dataset (adding intercept to
> each region's slope)?*
> *Q5)How would we calculate odds ratios using the model?*
> *Q6)How would we calculate the confidence intervals for these odds ratios
> using the model? *
>
> Thank you so much again for your advice. It's really been indispensable and
> is greatly appreciated.
>
> All the best,
> Olivia
>
> On Fri, Jul 26, 2019 at 1:44 PM Viechtbauer, Wolfgang (SP) <
> wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>
>> Hi Olivia,
>>
>> A couple comments before I get to your actual question:
>>
>> 1) You might want to divide the person-years by some factor. For example:
>>
>> ti = data$person_years / 1000
>>
>> The incidence rates then reflect the number of cases per 1000 person
>> years. That way, the rates aren't so small and the same goes for the model
>> coefficients (e.g., the 0.0027 will turn into 2.7, which is a bit easier to
>> communicate).
>>
>> 2) You don't need to write pd_ec$yi and pd_ec$vi when you use data=pd_ec.
>> So, rma.mv(yi, vi, ..., data=pd_ec) is sufficient.
>>
>> 3) You should include random effects for studies (as you have done) and
>> also random effects for estimates within studies. So:
>>
>> pd_ec$id <- 1:nrow(pd_ec)
>>
>> random = ~ 1 | study_id / id
>>
>> In fact, I see that you may have multiple estimates for the same cohort,
>> so one could even use an additional level, that is:
>>
>> random = ~ 1 | study_id / cohort_id / id
>>
>> But if you only have 8 rows of data, then I wouldn't do that (so then I
>> would use random = ~ 1 | study_id / id).
>>
>> 4) Now for your actual question: You are analyzing incidence rates, so the
>> coefficients of your model reflect how the (average) rate changes as a
>> function of those moderators (e.g., while the rate is 0.0027 for
>> "1Americas", the rate is 0.0027-0.0025=0.0002 for "Africa"). In other
>> words, the coefficients represent incidence rate differences. If you want
>> ratios, you should analyze log incidence rates (measure="IRLN"). Then the
>> coefficients reflect differences between log incidence rates, so if you
>> exponentiate the coefficients of your model, you get incidence rate ratios.
>>
>> Best,
>> Wolfgang
>>
>> -----Original Message-----
>> From: R-sig-meta-analysis [mailto:
>> r-sig-meta-analysis-bounces using r-project.org] On Behalf Of Olivia Cords
>> Sent: Friday, 26 July, 2019 21:27
>> To: r-sig-meta-analysis using r-project.org
>> Subject: [R-meta] metafor package in R - Risk ratios using rma.mv()
>>
>> ATTACHMENT(S) REMOVED: metafor_output_0724 (1).png | 702_tb_incidence.xlsx
>> | 702_metafor_tb_incidence (4).r
>>
>> Hello!
>> My name is Olivia and I'm a researcher at Stanford University. Our group
>> is trying to calculate relative risk ratios for the a meta-analysis
>> extracting incidence rates of tuberculosis disease using the metafor
>> package.
>>
>> Study Design:
>> Incidence rates were extracted for each study identified by a systematic
>> review, with some studies reporting multiple rates for different years or
>> locations. In this case, multiple rates were treated as different study
>> cohorts, meaning that the data is clustered ('cohort_id' nested within
>> 'study_id').
>>
>> Model:
>> We used the rma.mv() function, inputting the calculated incidence rate as
>> the outcome variable ('pdc$yi), the variance ('pdc$vi'), WHO region
>> ('who_region') and whether the study was conducted through passive or
>> active screening (passive_active') as moderators, and a random effects
>> argument for the study level ('study_id'). We are unclear how to go from
>> the output to Risk Ratios.
>>
>> Data:
>> study_id cohort_id n_diagnosed person_years who_region
>> passive_active
>> 131 34 77 14298 1Americas 1Passive
>> 93 120 5 27750 1Americas 1Passive
>> 93 121 14 277150 1Americas 1Passive
>> 93 122 15 2000 1Americas 1Passive
>> 136 383 2 2000 Africa 2Active
>> 136 383 7 100000 Africa 2Active
>> 187 16 16 100000 Africa 3Not
>> Specified
>> 187 517 2 100000 S.E. Asia 3Not
>> Specified
>>
>> Code:
>>
>> library(xlsx)
>> library(metafor)
>>
>> data <- read.xlsx('702_tb_incidence.xlsx', 1)
>> print(data)
>> #calculating the incidence rate and the variance
>> pd_ec <- escalc(
>> measure = 'IR',
>> xi = data$n_diagnosed,
>> ti = data$person_years,
>> data = data
>> )
>> pd_ec
>>
>> #specifying mixed effects model
>> #first level cohort incidence rate and variation
>> #second level study_id
>> #who_region and passive_active as moderators
>> m0 <- rma.mv(pd_ec$yi, pd_ec$vi, method='REML', mods = ~ who_region +
>> passive_active,
>> random= ~ 1 | study_id,
>> tdist=TRUE,
>> data=pd_ec)
>>
>> summary(m0)
>>
>> Output:
>> Multivariate Meta-Analysis Model (k = 8; method: REML)
>>
>> logLik Deviance AIC BIC AICc
>> 15.4961 -30.9922 -20.9922 -24.0607 39.0078
>>
>> Variance Components:
>>
>> estim sqrt nlvls fixed factor
>> sigma^2 0.0000 0.0037 4 no study_id
>>
>> Test for Residual Heterogeneity:
>> QE(df = 4) = 94.4465, p-val < .0001
>>
>> Test of Moderators (coefficients 2:4):
>> F(df1 = 3, df2 = 4) = 3.7931, p-val = 0.1152
>>
>> Model Results:
>>
>> estimate se tval pval ci.lb ci.ub
>> intrcpt 0.0027 0.0027 1.0073 0.3708 -0.0047
>> 0.0101
>> who_regionAfrica -0.0025 0.0046 -0.5493 0.6120 -0.0153
>> 0.0102
>> who_regionS.E. Asia -0.0027 0.0046 -0.5797 0.5932 -0.0154
>> 0.0101
>> passive_active2Active -0.0001 0.0053 -0.0167 0.9874 -0.0148 0.0146
>>
>> Any advice/insight in much appreciated!
>>
>> Best wishes,
>> Olivia
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-meta-analysis mailing list
> R-sig-meta-analysis using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-meta-analysis
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
>
--
Michael
http://www.dewey.myzen.co.uk/home.html
More information about the R-sig-meta-analysis
mailing list