Dear R users,
I am doing Cox regression using coxph(Survival) or cph(Design).
I have time varying effects (diagnosed with schoenfeld residuals Chi2 test and graph) so I first want to split time into 2 separate intervals : t<6months and t>=6months, to estimate one hazard ratio (hr) for each interval.
I am analysing Overall survival according to 3 prognostic factors (age,deep,ldh).
my dataframe (dtemp) looks like this:
> dtemp[1:3,]
id time status age_diag deep ldh
1 1 1.544148 1 65.42368 present elevated
2 2 17.051335 1 46.92676 present elevated
4 4 84.829569 0 65.86448 present normal
my first model without time varying hr is:
> mod<-coxph(Surv(time,status)~age_diag+deep+ldh,data=dtemp)
> mod
coef exp(coef) se(coef) z p
age_diag 0.0374 1.04 0.0124 3.03 0.0025
deeppresent 0.8639 2.37 0.3012 2.87 0.0041
ldhelevated 0.7222 2.06 0.2836 2.55 0.0110
Likelihood ratio test=21 on 3 df, p=0.000105 n= 91
To fit time dependent model, I used the survSplit() function to format dtemp into count data format:
> cdf<-survSplit(
cut=1:213, #cut at each month (last time is 213.9)
end="time",
start="start",
event="status",
data=dtemp)
> cdf[order(cdf$id),][1:3,]
id time status age_diag deep ldh start
1 1 1.000000 0 65.42368 present elevated 0
92 1 1.544148 1 65.42368 present elevated 1
2 2 1.000000 0 46.92676 present elevated 0
and then applied the coxph function like this:
> mod.time6<-coxph(Surv(start,time,status)~age_diag:(start<=6)+deep:(start<=6)+ldh,data=cdf)
> mod.time6
coef exp(coef) se(coef) z p
ldhelevated 0.6243 1.87 0.2871 2.17 0.0300
age_diag:start <= 6FALSE 0.0148 1.01 0.0145 1.02 0.3100
age_diag:start <= 6TRUE 0.0946 1.10 0.0272 3.48 0.0005
start <= 6FALSE:deeppresent 0.5167 1.68 0.3546 1.46 0.1500
start <= 6TRUE:deeppresent 1.7906 5.99 0.6451 2.78 0.0055
Likelihood ratio test=30.5 on 5 df, p=1.18e-05 n= 3715
My question is:
I would like to show that my time-dependent model (mod.time6) is better than my time-fixed model (mod) for prediction, because I know my time-fixed model underestimate risk before 6 months and overestimate risk if individuals are still alive after 6 months... but I cannot use predict() or resid() on my cdf dataframe. I have tried to compute the linear predictor before 6 months and after 6 months with the time-dependent hr but I am not sure I can do that... Does anybody know a good way to do this?
Finally, I would like to validate and calibrate (with bootstrap) my time-dependent model by using validate(Design) and calibrate(Design) but I don't know if this is possible with count data format...
Thanks for any help,
Youenn Drouet.
Centre Léon Bérard - France
[[alternative HTML version deleted]]