[R] survival::predict.coxph

Bernhard Reinhardt bernhard.reinhardt at dlr.de
Fri Feb 27 12:14:54 CET 2009

Hello Therry,

it´s really great to receive some feedback from a "pro". I´m not sure if 
I´ve got the point right:
You suppose that the cox-model isn´t good at forecasting an expected 
survival time because of the issues with the prediction of the 
survival-function at the right tail and one should better use parametric 
models like an exponential model? Or what do you mean by "smooth 
parametric estimate"?
Anyways I just ordered your book at the library. Hopefully I´ll get some 
more insights by the lecture of it.

Maybe I should point out why I even tried to do such forecasts.

Following the article "Quantifying climate-related risks and 
uncertainties using Cox regression models" by Maia and Meinke I try to 
deduce winter-precipitation from lagged Sea-Surface-Temperatures (SSTs).
So precipitation is my survival-time and and the SST-Observations at 
different lags are my covariates.
The sample size is only 55 and I´ve got 11 covariates (Lag=0 months to 
Lag=10 months) to choose from.
My first goal is to identify the optimal time-lag(s) between 
SST-Anomaly-Observation and Precipitation-Observation.
Expectation was that the lag should be some months.

I thought a cox-model would easily provide such a selection. At first I 
used the covariates individually. Coefficients for lags between 0 and 5 
months were all quite big and then decreasing from 6 to 10 months. So I 
think 5 months could be the lag of the process and high persistence of 
the SST accounts for the big coefficients for 0-4 months.

As the next step I used all 11 covariates at once. I hoped to gain 
similar results. Instead the sign of the coefficients "randomly" jumps 
from plus to minus and the magnitude as well is randomly distributed.

I also tried to using sets of three covariates e.g. with lag 4,5,6. But 
even then the sign of the coefficients is varying.

So my thought was that maybe I overfitted the model. But in fact I did 
not find any literature if that´s even possible. As far as my limited 
knowledge goes, overfitted models should reproduce the training-period 
very good but other periods very poor. So I first tried to reproduce the 
training-period. But so far with no success - as well with using 11 
covariates or just 1.


Bernhard R.

Terry Therneau wrote:
> You are mostly correct.
> Because of the censoring issue, there is no good estimate of the mean survival 
> time.  The survival curve either does not go to zero, or gets very noisy near 
> the right hand tail (large standard error); a smooth parametric estimate is what 
> is really needed to deal with this.
>   For this reason the mean survival, though computed (but see the 
> survfit.print.mean option, help(print.survfit)) is not highly regarded.  It is 
> not an option in predict.coxph.
>   	Terry T.
>  ----begin included message --------------
> Hi,
> if I got it right then the survival-time we expect for a subject is the 
> integral over the specific survival-function of the subject from 0 to t_max.
> If I have a trained cox-model and want to make a prediction of the 
> survival-time for a new subject I could use
> survfit(coxmodel, newdata=newSubject) to estimate a new 
> survival-function which I have to integrate thereafter.
> Actually I thought predict(coxmodel, newSubject) would do this for me, 
> but I?m confused which type I have to declare. If I understand the 
> little pieces of documentation right then none of the available types is 
> exactly the predicted survival-time.
> I think I have to use the mean survival-time of the baseline-function 
> times exp(the result of type linear predictor).
> Am I right?

More information about the R-help mailing list