[R] comparing SAS and R survival analysis with time-dependent covariates
Thomas Lumley
tlumley at uw.edu
Tue Jul 19 23:27:12 CEST 2011
On Wed, Jul 20, 2011 at 5:42 AM, AO_Statistics <aboueslati at gmail.com> wrote:
>
> Terry Therneau-2 wrote:
>>
>> This query of "why do SAS and S give different answers for Cox models"
>> comes
>> up every so often. The two most common reasons are that
>> a. they are using different options for the ties
>> b. the SAS and S data sets are slightly different.
>> You have both errors.
>>
>> First, make sure I have the same data set by reading a common file, and
>> then
>> compare the results.
>>
>> tmt54% more sdata.txt
>> 1 0.0 0.5 0 0
>> 1 0.5 3.0 1 1
>> 2 0.0 1.0 0 0
>> 2 1.0 1.5 1 1
>> 3 0.0 6.0 0 0
>> 4 0.0 8.0 0 1
>> 5 0.0 1.0 0 0
>> 5 1.0 8.0 1 0
>> 6 0.0 21.0 0 1
>> 7 0.0 3.0 0 0
>> 7 3.0 11.0 1 1
>>
>> tmt55% more test.sas
>> options linesize=80;
>>
>> data trythis;
>> infile 'sdata.txt';
>> input id start end delir outcome;
>>
>> proc phreg data=trythis;
>> model (start, end)*outcome(0)=delir/ ties=discrete;
>>
>> proc phreg data=trythis;
>> model (start, end)*outcome(0)=delir/ ties=efron;
>>
>>
>> tmt56% more test.r
>> trythis <- read.table('sdata.txt',
>> col.names=c("id", "start", "end", "delir",
>> "outcome"))
>>
>> coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='exact')
>> coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='efron')
>>
>> -----------------
>> I now get comparable answers. Note that Cox's "exact partial likelihood"
>> is
>> the correct form to use for discrete time data. I labeled this as the
>> 'exact'
>> method and SAS as the 'discrete' method. The "exact marginal likelihood"
>> of
>> Prentice et al, which SAS calls the 'exact' method is not implemented in
>> S.
>>
>> As to which package is more reliable, I can only point to a set of
>> formal test
>> cases that are found in Appendix E of the book by Therneau and Grambsch.
>>
>> [...]
>>
>>
>
>
> I am processing estimations of regression parameters in the Cox model for
> recurrent event data with time-dependent covariates. As my data sets contain
> a lot of ties, I use the "discrete" method of SAS ("PHREG" procedure) and
> "exact" option in R ("coxph" function of "survival" package).
>
> Despite the high computation time (up to 45s), I always get estimations
> without error or warning message with the "PHREG" procedure.
> On the other hand, when I use R software (latest version 2.13.11 on 32 or 64
> bits), I sometimes get different estimates from those obtained with SAS and
> I get various warnings. And some other time I don't get any result, R
> freezes and does not respond.
>
> In order to understand, I have tried some tests from your examples. It turns
> out that dysfunctions appear when the proportion of ties become important :
>
Edited down to results:
R
> coef exp(coef) se(coef) z p
> delir 22.5 6.06e+09 15460 0.00146 1
SAS
> estimate delir : 20.52466
> se : 5689
R
>
> coef exp(coef) se(coef) z p
> delir -20.8 9.42e-10 42054 -0.000494 1
SAS
> estimate delir : -17.78257
> se : 9383
> Pr > Khi 2 : 0.9985
> convergence status : "Convergence criterion (GCONV=1E-8) satisfied."
The warning and error messages are correct here. Look at the point
estimate. It's a log hazard ratio of about 20 in one case and about
-20 in the other case. The true partial maximum likelihood estimator
is infinite. The estimated standard errors are meaningless, since the
partial likelihood isn't close to quadratic at the maximum.
-thomas
--
Thomas Lumley
Professor of Biostatistics
University of Auckland
More information about the R-help
mailing list