[R] comparing SAS and R survival analysis with time-dependent covariates

Thomas Lumley tlumley at uw.edu
Tue Jul 19 23:27:12 CEST 2011


On Wed, Jul 20, 2011 at 5:42 AM, AO_Statistics <aboueslati at gmail.com> wrote:
>
> Terry Therneau-2 wrote:
>>
>> This query of "why do SAS and S give different answers for Cox models"
>> comes
>> up every so often.  The two most common reasons are that
>>       a. they are using different options for the ties
>>       b. the SAS and S data sets are slightly different.
>> You have both errors.
>>
>> First, make sure I have the same data set by reading a common file, and
>> then
>> compare the results.
>>
>> tmt54% more sdata.txt
>>  1   0.0  0.5     0       0
>>  1   0.5  3.0     1       1
>>  2   0.0  1.0     0       0
>>  2   1.0  1.5     1       1
>>  3   0.0  6.0     0       0
>>  4   0.0  8.0     0       1
>>  5   0.0  1.0     0       0
>>  5   1.0  8.0     1       0
>>  6   0.0 21.0     0       1
>>  7   0.0  3.0     0       0
>>  7   3.0 11.0     1       1
>>
>> tmt55% more test.sas
>> options linesize=80;
>>
>> data trythis;
>>     infile 'sdata.txt';
>>     input id start end delir outcome;
>>
>> proc phreg data=trythis;
>>   model (start, end)*outcome(0)=delir/ ties=discrete;
>>
>> proc phreg data=trythis;
>>   model (start, end)*outcome(0)=delir/ ties=efron;
>>
>>
>> tmt56% more test.r
>> trythis <- read.table('sdata.txt',
>>                       col.names=c("id", "start", "end", "delir",
>> "outcome"))
>>
>> coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='exact')
>> coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='efron')
>>
>> -----------------
>>  I now get comparable answers.  Note that Cox's "exact partial likelihood"
>> is
>> the correct form to use for discrete time data.  I labeled this as the
>> 'exact'
>> method and SAS as the 'discrete' method.  The "exact marginal likelihood"
>> of
>> Prentice et al, which SAS calls the 'exact' method is not implemented in
>> S.
>>
>>   As to which package is more reliable, I can only point to a set of
>> formal test
>> cases that are found in Appendix E of the book by Therneau and Grambsch.
>>
>> [...]
>>
>>
>
>
> I am processing estimations of regression parameters in the Cox model for
> recurrent event data with time-dependent covariates. As my data sets contain
> a lot of ties, I use the "discrete" method of SAS ("PHREG" procedure) and
> "exact" option in R ("coxph" function of "survival" package).
>
> Despite the high computation time (up to 45s), I always get estimations
> without error or warning message with the "PHREG" procedure.
> On the other hand, when I use R software (latest version 2.13.11 on 32 or 64
> bits), I sometimes get different estimates from those obtained with SAS and
> I get various warnings. And some other time I don't get any result, R
> freezes and does not respond.
>
> In order to understand, I have tried some tests from your examples. It turns
> out that dysfunctions appear when the proportion of ties become important :
>
Edited down to results:
R
>      coef exp(coef) se(coef)       z p
> delir 22.5  6.06e+09    15460 0.00146 1
SAS
> estimate delir : 20.52466
> se : 5689
R
>
>       coef exp(coef) se(coef)         z p
> delir -20.8  9.42e-10    42054 -0.000494 1
SAS
> estimate delir : -17.78257
> se : 9383
> Pr > Khi 2 : 0.9985
> convergence status : "Convergence criterion (GCONV=1E-8) satisfied."

The warning and error messages are correct here.  Look at the point
estimate. It's a log hazard ratio of about 20 in one case and about
-20 in the other case.  The true partial maximum likelihood estimator
is infinite. The estimated standard errors are meaningless, since the
partial likelihood isn't close to quadratic at the maximum.


    -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-help mailing list