[R] Trend test for hazard ratios

Wed Apr 16 22:48:23 CEST 2014

On Apr 16, 2014, at 4:07 AM, Göran Broström wrote:

> On 04/15/2014 10:51 PM, David Winsemius wrote:
>> 
>> On Apr 15, 2014, at 6:32 AM, Therneau, Terry M., Ph.D. wrote:
>> 
>>> You can do statistical tests within a single model, for whether
>>> portions of it fit or do not fit.  But one cannot take three
>>> separate fits and compare them.  The program needs context to know
>>> how the three relate to one another.  Say that "group" is your
>>> strata variable, trt the variable of interest, and x1, x2 are
>>> adjusters.
>>> 
>>> fit <- coxph(Surv(time,status) ~ trt * strata(group) + x1 + x2,
>>> data=mydata)
>>> 
>>> Will fit a model with a separate treatment coefficient for each of
>>> the groups, and a separate baseline hazard for each.  One can now
>>> create a contrast that corresponds to your trend test, using
>>> vcov(fit) for the variance matrix and coef(fit) to retrieve the
>>> coefficients.
>>> 
>> 
>> I have at the moment on my workspace a dataset of breast cancer cases
>> extracted from SEER that has a factor representing three grades of
>> histology: $Grade.abb. $ Grade.abb     : Factor w/ 3 levels
>> "Well","Moderate", "Poorly"
>> 
>> It would be reasonable to test whether the grading has a "trend" in
>> its effect when controlling for other factors (and it would be
>> surprising to a medical audience if there were no effect.). So I
>> compare across models using  AgeStgSiz.NG.Rad as my "linear" trend"
>> model (with one df for an `as.numeric` version, AgeStgSizRad as my
>> no-grade-included baseline, and AgeStgSiz.FG.Rad as my full factor
>> version:
> 
> David, your problem is different from the original one; I think you can solve yours by transforming the (unordered) factor to an ordered.

Thank you for your interest, I don't see that it is different. I think I do understand model comparison when the models do not have strata.

> 
> Try
> 
> AgeStgSiz.NG.Rad <- coxph(Surv(Survmon,
> Vital.status.recode..study.cutoff.used.=="Dead") ~ AgeDx+
> SEER.historic.stage.A+ as.numeric(Size.cat) + ordered(Grade.abb)
> + Rgrp , data=BrILR)
> 
> and contrasts based on orthogonal polynomials are used for Grade.abb
> 
> see ?contr.poly

Yes. That should give similar (and hopefully identical)  results to those obtained with the nested models for which I supplied the deviance estimates. The as.numeric() version would correspond to the linear term in an ordered factor polynomial contrast (although it would not be identical since it is not in the full model.) It should be reported as "highly significant" since it is associated with a change in deviance of 5888.492 - 5166.291 on one d.f.. Because there was no material difference in deviance (change of less than 1)  with adding the covariate as an unordered factor, I would predict that the quadratic term should report a p-value in the conventionally "insignificant" range and that is what is seen:

(AgeStgSiz.OG.Rad <- coxph( Surv(Survmon, Vital.status.recode..study.cutoff.used.=="Dead") ~ AgeDx+ SEER.historic.stage.A+ as.numeric(Size.cat) + ordered(Grade.abb) + Rgrp , data=BrILR))
Call:
coxph(formula = Surv(Survmon, Vital.status.recode..study.cutoff.used. == 
    "Dead") ~ AgeDx + SEER.historic.stage.A + as.numeric(Size.cat) + 
    ordered(Grade.abb) + Rgrp, data = BrILR)

                                  coef exp(coef) se(coef)       z    p
AgeDx                           0.0393     1.040   0.0030  13.099 0.00
SEER.historic.stage.ALocalized  0.8986     2.456   0.0679  13.225 0.00
SEER.historic.stage.ARegional   1.4663     4.333   0.0682  21.513 0.00
as.numeric(Size.cat)            0.1101     1.116   0.0028  39.362 0.00
ordered(Grade.abb).L            0.5291     1.697   0.0234  22.608 0.00
ordered(Grade.abb).Q           -0.0166     0.984   0.0177  -0.941 0.35
RgrpTRUE                       -0.2696     0.764   0.0177 -15.212 0.00

Likelihood ratio test=5889  on 7 df, p=0  n= 59583, number of events= 13070

Notice that : 1- pchisq( 2*diff(summary(AgeStgSiz.FG.Rad)[['loglik']]) -2*diff(summary(AgeStgSiz.NG.Rad)[['loglik']]), 1)
[1] 0.3462616

... is exactly the p-value reported for the quadratic term. (And the z-statistic is the negative of the square root of the difference in deviance.)

I was hoping for a demonstration of taking the coef() and vcov() values to construct contrast estimates and variances? I've tried hacking the estimable function in the gmodels package so that it would accept an object that had a coef and vcov object, but I'm "not quite there yet".

-- 
David.

> 
> Göran B.
>> 
>>> AgeStgSiz.NG.Rad <- coxph( Surv(Survmon,
>>> Vital.status.recode..study.cutoff.used.=="Dead") ~ AgeDx+
>>> SEER.historic.stage.A+ as.numeric(Size.cat) + as.numeric(Grade.abb)
>>> + Rgrp , data=BrILR) AgeStgSizRad <- coxph( Surv(Survmon,
>>> Vital.status.recode..study.cutoff.used.=="Dead") ~ AgeDx+
>>> SEER.historic.stage.A+ as.numeric(Size.cat) + Rgrp , data=BrILR)
>>> AgeStgSiz.FG.Rad <- coxph( Surv(Survmon,
>>> Vital.status.recode..study.cutoff.used.=="Dead") ~ AgeDx+
>>> SEER.historic.stage.A+ as.numeric(Size.cat) + Grade.abb + Rgrp ,
>>> data=BrILR) 2*diff(summary(AgeStgSizRad)[['loglik']])
>> [1] 5166.291
>>> 2*diff(summary(AgeStgSiz.NG.Rad)[['loglik']])
>> [1] 5888.492
>>> 2*diff(summary(AgeStgSiz.FG.Rad)[['loglik']])
>> [1] 5889.379
>> 
>> So there is strong evidence that adding grade to the existing
>> covariates improves the fit but that representing as separate factor
>> values with one extra degree of freedom may not be needed. When I add
>> grade as a stratum I get a very different 2*loglikelihood:
>> 
>>> AgeStgSiz.SG.Rad <- coxph( Surv(Survmon,
>>> Vital.status.recode..study.cutoff.used.=="Dead") ~ AgeDx+
>>> SEER.historic.stage.A+ as.numeric(Size.cat) + strata(Grade.abb) +
>>> Rgrp , data=BrILR) 2*diff(summary(AgeStgSiz.SG.Rad)[['loglik']])
>> [1] 3980.908
>> 
>>> dput( vcov(AgeStgSiz.SG.Rad) )
>> structure(c(9.00241385282728e-06, -4.45446264398645e-07,
>> 5.18927440846587e-07, 2.62020260612094e-07, 7.47434378232446e-07,
>> -4.45446264398645e-07, 0.0046168537719431, 0.00445692601518848,
>> -8.67833275051278e-07, -3.68093395861629e-05, 5.18927440846587e-07,
>> 0.00445692601518848, 0.00464685164887969, -1.61616621634903e-05,
>> -3.77256742079467e-05, 2.62020260612094e-07, -8.67833275051278e-07,
>> -1.61616621634903e-05, 7.84049821976807e-06, 1.8221575745622e-06,
>> 7.47434378232446e-07, -3.68093395861629e-05, -3.77256742079467e-05,
>> 1.8221575745622e-06, 0.000313989310316303), .Dim = c(5L, 5L),
>> .Dimnames = list(c("AgeDx", "SEER.historic.stage.ALocalized",
>> "SEER.historic.stage.ARegional", "as.numeric(Size.cat)", "RgrpTRUE"),
>> c("AgeDx", "SEER.historic.stage.ALocalized",
>> "SEER.historic.stage.ARegional", "as.numeric(Size.cat)", "RgrpTRUE"
>> )))
>>> dput(coef(AgeStgSiz.SG.Rad))
>> structure(c(0.0393472050734995, 0.901971276489615, 1.46695753267995,
>> 0.108860100649677, -0.273688779502084), .Names = c("AgeDx",
>> "SEER.historic.stage.ALocalized", "SEER.historic.stage.ARegional",
>> "as.numeric(Size.cat)", "RgrpTRUE" ))
>> 
>> I'm not particularly facile with contrast construction with var-covar
>> matrices, so hoping for a worked example. Also wondering of the
>> cross-model comparisons are invalid or less powerful?
>> 
>> 
>>> Terry T.
>>> 
>>> 
>>> 
>>> On 04/15/2014 05:00 AM, r-help-request at r-project.org wrote:
>>>> Hello,
>>>> 
>>>> I have the following problem. I stratified my patient cohort into
>>>> three ordered groups and performed multivariate adjusted Cox
>>>> regression analysis on each group separately. Now I would like to
>>>> calculate a p for trend across the hazard ratios that I got for
>>>> the three groups. How can I do that if I only have the HR and the
>>>> confidence interval? For example I got the following HRs for one
>>>> endpoint:
>>>> 
>>>> 1.09(0.68-1.74),	1.29(0.94-1.76) and 1.64(1.01-2.68).
>>>> 
>>>> There is a trend but how do I calculate if it is significant?
>>>> 
>>>> Best regards
>>>> 
>>>> Marcus Kleber
>>>> 
>>> 
>>> ______________________________________________ R-help at r-project.org
>>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>>> read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius Alameda, CA, USA
>> 
>> ______________________________________________ R-help at r-project.org
>> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>> read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA