Dear R users,

 
I am doing Cox regression using coxph(Survival) or cph(Design).

I have time varying effects (diagnosed with schoenfeld residuals Chi2 test and graph) so I first want to split time into 2 separate intervals : t<6months and t>=6months, to estimate one hazard ratio (hr) for each interval. 

 
I am analysing Overall survival according to 3 prognostic factors (age,deep,ldh). 

 
my dataframe (dtemp) looks like this:

> dtemp[1:3,]

  id      time status age_diag    deep      ldh

1  1  1.544148      1 65.42368 present elevated

2  2 17.051335      1 46.92676 present elevated

4  4 84.829569      0 65.86448 present   normal

 
my first model without time varying hr is:

> mod<-coxph(Surv(time,status)~age_diag+deep+ldh,data=dtemp)

> mod

              coef exp(coef) se(coef)    z      p

age_diag    0.0374      1.04   0.0124 3.03 0.0025

deeppresent 0.8639      2.37   0.3012 2.87 0.0041

ldhelevated 0.7222      2.06   0.2836 2.55 0.0110

 
Likelihood ratio test=21  on 3 df, p=0.000105  n= 91 

 
To fit time dependent model, I used the survSplit() function to format dtemp into count data format:

 
> cdf<-survSplit(

          cut=1:213, #cut at each month (last time is 213.9)

          end="time",

          start="start",

          event="status",

          data=dtemp)

 
> cdf[order(cdf$id),][1:3,]

   id     time status age_diag    deep      ldh start

1   1 1.000000      0 65.42368 present elevated     0

92  1 1.544148      1 65.42368 present elevated     1

2   2 1.000000      0 46.92676 present elevated     0

 
and then applied the coxph function like this: 

> mod.time6<-coxph(Surv(start,time,status)~age_diag:(start<=6)+deep:(start<=6)+ldh,data=cdf)

> mod.time6

                              coef exp(coef) se(coef)    z      p

ldhelevated                 0.6243      1.87   0.2871 2.17 0.0300

age_diag:start <= 6FALSE    0.0148      1.01   0.0145 1.02 0.3100

age_diag:start <= 6TRUE     0.0946      1.10   0.0272 3.48 0.0005

start <= 6FALSE:deeppresent 0.5167      1.68   0.3546 1.46 0.1500

start <= 6TRUE:deeppresent  1.7906      5.99   0.6451 2.78 0.0055

 
Likelihood ratio test=30.5  on 5 df, p=1.18e-05  n= 3715

 
My question is:

 
I would like to show that my time-dependent model (mod.time6) is better than my time-fixed model (mod) for prediction, because I know my time-fixed model underestimate risk before 6 months and overestimate risk if individuals are still alive after 6 months... but I cannot use predict() or resid() on my cdf dataframe. I have tried to compute the linear predictor before 6 months and after 6 months with the time-dependent hr but I am not sure I can do that... Does anybody know a good way to do this?

Finally, I would like to validate and calibrate (with bootstrap) my time-dependent model by using validate(Design) and calibrate(Design) but I don't know if this is possible with count data format...

 
Thanks for any help,

 
Youenn Drouet.

Centre Léon Bérard - France


	[[alternative HTML version deleted]]