[R] results of a survival analysis change when converting the data to counting process format

Fri Aug 23 15:50:05 CEST 2019

On the other hand, SAS gets the same answer all three ways.  And the answer SAS gets is closest to the one that R gets when using the daily cutting.

Obs _TIES_ _TYPE_ _STATUS_ _NAME_ trt prior karno _LNLIKE_ 
1 EFRON PARMS 0 Converged time 0.1801972156 -0.005550913 -0.033771016 -483.9277463 
2 EFRON PARMS 0 Converged time 0.1801972156 -0.005550913 -0.033771016 -483.9277463 
3 EFRON PARMS 0 Converged time 0.1801972156 -0.005550913 -0.033771016 -483.9277463 

proc phreg data=veteran1 outest = vet1;
	model time * status(0) = trt prior karno / ties = efron;
proc phreg data=veteran2 outest = vet2;
	model time * status(0) = trt prior karno / ties = efron entry=tstart;
proc phreg data=veteran3 outest = vet3;
	model time * status(0) = trt prior karno / ties = efron entry=tstart;

data vets; set vet1 vet2 vet3;

proc print data=vets width=FULL;

run;

-----Original Message-----
From: Andrews, Chris 
Sent: Friday, August 23, 2019 9:18 AM
To: 'Göran Broström'; r-help using r-project.org; tamas.ferenci using medstat.hu
Subject: RE: [R] results of a survival analysis change when converting the data to counting process format

# For what it is worth, even the second fit (cuts at observation times) does not give identical coefficient estimates as using the original data structure.

answer <- coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )
veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno, data = veteran, cut = unique( veteran$time ) )
answer2 <- coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) )
answer2 - answer
 #          trt         prior         karno 
 # 2.775558e-16 -1.127570e-17 -6.938894e-18

# If you cut daily, but not all the way to 999, you get a few different fits

ff <- function(m) {
	veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, data = veteran, cut = seq(m))
	answer3 <- coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) )
	return(answer3)
}

answers3 <- sapply(100:999, ff)
plot(100:999, answers3[1,] - answer[1])

# But certainly all these differences are of no practical importance.

-----Original Message-----
From: Göran Broström [mailto:goran.brostrom using umu.se] 
Sent: Friday, August 23, 2019 5:13 AM
To: r-help using r-project.org; tamas.ferenci using medstat.hu
Subject: Re: [R] results of a survival analysis change when converting the data to counting process format

Den 2019-08-22 kl. 21:48, skrev Göran Broström:
> 
> 
> On 2019-08-18 19:10, Ferenci Tamas wrote:
>> Dear All,
>>
>> Consider the following simple example:
>>
>> library( survival )
>> data( veteran )
>>
>> coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )
>>           trt        prior        karno
>>   0.180197194 -0.005550919 -0.033771018
>>
>> Note that we have neither time-dependent covariates, nor time-varying
>> coefficients, so the results should be the same if we change to
>> counting process format, no matter where we cut the times.
>>
>> That's true if we cut at event times:
>>
>> veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno,
>>                         data = veteran, cut = unique( veteran$time ) )
>>
>> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = 
>> veteran2 ) )
>>           trt        prior        karno
>>   0.180197194 -0.005550919 -0.033771018
>>
>> But quite interestingly not true, if we cut at every day:
>>
>> veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
>>                         data = veteran, cut = 1:max(veteran$time) )
>>
>> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = 
>> veteran3 ) )
>>           trt        prior        karno
>>   0.180197215 -0.005550913 -0.033771016
>>
>> The difference is not large, but definitely more than just a rounding
>> error, or something like that.
>>
>> What's going on? How can the results get wrong, especially by
>> including more cutpoints?
> 
> All results are wrong, but they are useful (paraphrasing George EP Box).

That said, it is a little surprising: The generated risk sets are 
(should be) identical in all cases, and one would expect rounding errors 
to be the same. But data get stored differently, and ... who knows?

I tried your examples on my computer and got exactly the same results as 
you. Which surprised me.

G,

> 
> Göran
> 
>>
>> Thank you in advance,
>> Tamas
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues