[R] results of a survival analysis change when converting the data to counting process format
Andrews, Chris
chr|@@@ @end|ng |rom med@um|ch@edu
Fri Aug 23 15:18:19 CEST 2019
# For what it is worth, even the second fit (cuts at observation times) does not give identical coefficient estimates as using the original data structure.
answer <- coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )
veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno, data = veteran, cut = unique( veteran$time ) )
answer2 <- coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) )
answer2 - answer
# trt prior karno
# 2.775558e-16 -1.127570e-17 -6.938894e-18
# If you cut daily, but not all the way to 999, you get a few different fits
ff <- function(m) {
veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, data = veteran, cut = seq(m))
answer3 <- coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) )
return(answer3)
}
answers3 <- sapply(100:999, ff)
plot(100:999, answers3[1,] - answer[1])
# But certainly all these differences are of no practical importance.
-----Original Message-----
From: Göran Broström [mailto:goran.brostrom using umu.se]
Sent: Friday, August 23, 2019 5:13 AM
To: r-help using r-project.org; tamas.ferenci using medstat.hu
Subject: Re: [R] results of a survival analysis change when converting the data to counting process format
Den 2019-08-22 kl. 21:48, skrev Göran Broström:
>
>
> On 2019-08-18 19:10, Ferenci Tamas wrote:
>> Dear All,
>>
>> Consider the following simple example:
>>
>> library( survival )
>> data( veteran )
>>
>> coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )
>> trt prior karno
>> 0.180197194 -0.005550919 -0.033771018
>>
>> Note that we have neither time-dependent covariates, nor time-varying
>> coefficients, so the results should be the same if we change to
>> counting process format, no matter where we cut the times.
>>
>> That's true if we cut at event times:
>>
>> veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno,
>> data = veteran, cut = unique( veteran$time ) )
>>
>> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data =
>> veteran2 ) )
>> trt prior karno
>> 0.180197194 -0.005550919 -0.033771018
>>
>> But quite interestingly not true, if we cut at every day:
>>
>> veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
>> data = veteran, cut = 1:max(veteran$time) )
>>
>> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data =
>> veteran3 ) )
>> trt prior karno
>> 0.180197215 -0.005550913 -0.033771016
>>
>> The difference is not large, but definitely more than just a rounding
>> error, or something like that.
>>
>> What's going on? How can the results get wrong, especially by
>> including more cutpoints?
>
> All results are wrong, but they are useful (paraphrasing George EP Box).
That said, it is a little surprising: The generated risk sets are
(should be) identical in all cases, and one would expect rounding errors
to be the same. But data get stored differently, and ... who knows?
I tried your examples on my computer and got exactly the same results as
you. Which surprised me.
G,
>
> Göran
>
>>
>> Thank you in advance,
>> Tamas
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the R-help
mailing list