[R] comparing SAS and R survival analysis with time-dependent covariates
AO_Statistics
aboueslati at gmail.com
Tue Jul 19 19:42:02 CEST 2011
Terry Therneau-2 wrote:
>
> This query of "why do SAS and S give different answers for Cox models"
> comes
> up every so often. The two most common reasons are that
> a. they are using different options for the ties
> b. the SAS and S data sets are slightly different.
> You have both errors.
>
> First, make sure I have the same data set by reading a common file, and
> then
> compare the results.
>
> tmt54% more sdata.txt
> 1 0.0 0.5 0 0
> 1 0.5 3.0 1 1
> 2 0.0 1.0 0 0
> 2 1.0 1.5 1 1
> 3 0.0 6.0 0 0
> 4 0.0 8.0 0 1
> 5 0.0 1.0 0 0
> 5 1.0 8.0 1 0
> 6 0.0 21.0 0 1
> 7 0.0 3.0 0 0
> 7 3.0 11.0 1 1
>
> tmt55% more test.sas
> options linesize=80;
>
> data trythis;
> infile 'sdata.txt';
> input id start end delir outcome;
>
> proc phreg data=trythis;
> model (start, end)*outcome(0)=delir/ ties=discrete;
>
> proc phreg data=trythis;
> model (start, end)*outcome(0)=delir/ ties=efron;
>
>
> tmt56% more test.r
> trythis <- read.table('sdata.txt',
> col.names=c("id", "start", "end", "delir",
> "outcome"))
>
> coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='exact')
> coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='efron')
>
> -----------------
> I now get comparable answers. Note that Cox's "exact partial likelihood"
> is
> the correct form to use for discrete time data. I labeled this as the
> 'exact'
> method and SAS as the 'discrete' method. The "exact marginal likelihood"
> of
> Prentice et al, which SAS calls the 'exact' method is not implemented in
> S.
>
> As to which package is more reliable, I can only point to a set of
> formal test
> cases that are found in Appendix E of the book by Therneau and Grambsch.
>
> [...]
>
>
I am processing estimations of regression parameters in the Cox model for
recurrent event data with time-dependent covariates. As my data sets contain
a lot of ties, I use the "discrete" method of SAS ("PHREG" procedure) and
"exact" option in R ("coxph" function of "survival" package).
Despite the high computation time (up to 45s), I always get estimations
without error or warning message with the "PHREG" procedure.
On the other hand, when I use R software (latest version 2.13.11 on 32 or 64
bits), I sometimes get different estimates from those obtained with SAS and
I get various warnings. And some other time I don't get any result, R
freezes and does not respond.
In order to understand, I have tried some tests from your examples. It turns
out that dysfunctions appear when the proportion of ties become important :
-------------------- Test1 --------------------
With R :
> (trythis <- read.table('***\\sdata4.txt',
+ col.names=c("id", "start", "end", "delir",
"outcome")))
id start end delir outcome
1 1 0.5 3.0 1 1
2 1 0.5 3.0 1 1
3 1 0.5 3.0 1 1
4 1 0.5 3.0 1 1
5 1 0.5 3.0 1 1
6 2 1.0 1.5 1 1
7 2 1.0 1.5 1 1
8 2 1.0 1.5 1 1
9 2 1.0 1.5 1 1
10 2 1.0 1.5 1 1
11 2 1.0 1.5 1 1
12 2 1.0 1.5 1 1
13 2 1.0 1.5 1 1
14 4 0.0 8.0 0 1
15 4 0.0 8.0 0 1
16 4 0.0 8.0 0 1
17 4 0.0 8.0 0 1
18 5 0.0 1.0 0 0
19 5 0.0 1.0 0 0
20 5 0.0 1.0 0 0
21 5 0.0 1.0 0 0
> coxph(Surv(start, end, outcome) ~ delir, data=trythis, method='exact')
Call:
coxph(formula = Surv(start, end, outcome) ~ delir, data = trythis,
method = "exact")
coef exp(coef) se(coef) z p
delir 22.5 6.06e+09 15460 0.00146 1
Likelihood ratio test=15.6 on 1 df, p=8.04e-05 n= 21, number of events= 17
Message d'avis :
In fitter(X, Y, strats, offset, init, control, weights = weights, :
Ran out of iterations and did not converge
With SAS :
data trythis ;
input id start end delir outcome;
datalines;
1 0.5 3.0 1 1
1 0.5 3.0 1 1
1 0.5 3.0 1 1
1 0.5 3.0 1 1
1 0.5 3.0 1 1
2 1.0 1.5 1 1
2 1.0 1.5 1 1
2 1.0 1.5 1 1
2 1.0 1.5 1 1
2 1.0 1.5 1 1
2 1.0 1.5 1 1
2 1.0 1.5 1 1
2 1.0 1.5 1 1
4 0.0 8.0 0 1
4 0.0 8.0 0 1
4 0.0 8.0 0 1
4 0.0 8.0 0 1
5 0.0 1.0 0 0
5 0.0 1.0 0 0
5 0.0 1.0 0 0
5 0.0 1.0 0 0
run;
proc phreg data=trythis;
model (start, end)*outcome(0)=delir/ ties=discrete;
RUN;
No error message, results :
estimate delir : 20.52466
se : 5689
Pr > Khi 2 : 0.9971
convergence status : "Convergence criterion (GCONV=1E-8) satisfied."
-------------------- Test2 --------------------
With R :
> (trythis <- read.table('***\\montest.txt',
+ col.names=c("id", "start", "end", "delir",
"outcome")))
id start end delir outcome
1 1 1 2 1 1
2 1 1 2 0 1
3 1 1 2 0 1
4 1 1 2 1 1
5 1 1 2 0 1
6 1 1 2 1 1
7 2 1 2 1 1
8 2 1 2 1 0
9 3 8 10 0 0
10 4 8 10 0 0
> coxph(Surv(start, end, outcome) ~ delir, data=trythis, method='exact')
Call:
coxph(formula = Surv(start, end, outcome) ~ delir, data = trythis,
method = "exact")
coef exp(coef) se(coef) z p
delir -20.8 9.42e-10 42054 -0.000494 1
Likelihood ratio test=0.94 on 1 df, p=0.332 n= 10, number of events= 7
Message d'avis :
In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 1 ; beta may be infinite.
With SAS :
data trythis ;
input id start end delir outcome;
datalines;
1 1.0 2.0 1 1
1 1.0 2.0 0 1
1 1.0 2.0 0 1
1 1.0 2.0 1 1
1 1.0 2.0 0 1
1 1.0 2.0 1 1
2 1.0 2.0 1 1
2 1.0 2.0 1 0
3 8.0 10.0 0 0
4 8.0 10.0 0 0
run;
proc phreg data=trythis;
model (start, end)*outcome(0)=delir/ ties=discrete;
RUN;
results :
estimate delir : -17.78257
se : 9383
Pr > Khi 2 : 0.9985
convergence status : "Convergence criterion (GCONV=1E-8) satisfied."
So I get 2 different warning messages and with my data sets it can be worse.
How would you explain these dysfunctions or warnings with "coxph" function
and how should I deal with it ? Do you think that the results obtained by
SAS are reliable in such cases ?
Thank you for your answer.
--
View this message in context: http://r.789695.n4.nabble.com/comparing-SAS-and-R-survival-analysis-with-time-dependent-covariates-tp874438p3678763.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list