[R] Predictions from "coxph" or "cph" objects

Göran Broström goran.brostrom at umu.se
Tue Jul 8 14:25:32 CEST 2014


On 2014-07-06 11:17, Göran Broström wrote:
> On 2014-07-06 10:48, Göran Broström wrote:
>> David and Axel,
>>
>> I have two comments to your discussion:
>>
>> (i) The area under the survival curve is equal to the mean of the
>> distribution, so the estimate of the mean should be the sum of the areas
>> of the rectangles defined by the estimated survival curve and the
>> successive distances between observed event times.
>>
>> Thus,
>>
>>   > surv <- pred$surv
>>   > time <- pred$time
>>   > sum(surv * diff(time))
>>
>> should give you the (estimated) mean). (Note that time[1] == 0, and
>> length(time) == length(surv) + 1)
>
> Well, this is not quite true; on the first interval the survival curve
> is one, so you need to
>
>   > surv <- c(1, surv)
>
> first. But then the lengths of the surv and time vectors do not match so
> you need to add a (large) time at the end of time. If the largest
> observation is an event, 'no problem' (surv is zero), but otherwise ...
>
> Btw, I tried
>
>   > exit <- rexp(10)
>   > event <- rep(1, 10)
>   > fit <- coxph(Surv(exit, event) ~ 1)
>
>   > survfit(fit)$surv
>    [1] 0.90483742 0.80968410 0.71454371 0.61942215 0.52432953 0.42928471
>    [7] 0.33432727 0.23955596 0.14529803 0.05345216
>
>   > survfit(Surv(exit, event) ~ 1)$surv
> [1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
>
> so be careful ...

Addendum: Note the argument 'type':

 > survfit(fit, type = "kalbfleisch-prentice")$surv
  [1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

This gives, I think (at least if no ties), the nonparametric maximum 
likelihood estimator (NPMLE) of the baseline survivor function in the PH 
model, and thus coincides with the Kaplan-Meier estimator in the case of 
no covariates. It drops down to zero if the largest observation is a 
failure. See Kalbfleisch & Prentice (1980), pp. 84-86.

I suppose that the default type ( = "efron") simply uses the formula 
S(t) = exp(-H(t)) (on an estimator of H(t)), which never drops down to 
zero, censorings or no censorings. This relation does not hold for 
discrete-time distributions, and even if our underlying model is 
continuous, the resulting non-parametric estimator of H(t) define a 
discrete-time distribution, which causes a small dilemma for the purist. 
(Should 'type = "kalbfleisch-prentice"' be the default in survfit?)

However, in practice this is of no importance at all. If you stick to 
the median and quantiles.

Göran

>
> Göran
>
>>
>> (I do not think that David's suggestion gives the same answer, but I may
>> be wrong.)
>>
>> (ii) With censored data, this may be a bad idea. For instance, when the
>> largest observation is a censoring time, you may badly underestimate the
>> mean. Your best hope is to be able to estimate a conditional mean of the
>> type E(T | T < x).
>>
>> This is essentially a non-parametric situation, and therefore it is
>> better to stick to medians and quantiles.
>>
>> Göran Broström
>>
>> On 2014-07-06 06:17, David Winsemius wrote:
>>>
>>> On Jul 5, 2014, at 9:12 PM, David Winsemius wrote:
>>>
>>>>
>>>> On Jul 5, 2014, at 12:43 PM, Axel Urbiz wrote:
>>>>
>>>>> Thank you David. It is my understanding that using survfirsurvit
>>>>> below I get the median predicted survival. I actually was looking
>>>>> for the mean. I can't seem to find in the documentation how to get
>>>>> that.
>>>>>
>>>>> options(na.action=na.exclude) # retain NA in predictions
>>>>> fit <- coxph(Surv(time, status) ~ age + ph.ecog, lung)
>>>>> pred <- survfit(fit, newdata=lung)
>>>>> head(pred)
>>>>>
>>>> There might be a way. I don't know it if so, so I would probably
>>>> just use the definition of the mean:
>>>>
>>>> sum(summary(pred)$surv* summary(pred)$time)/sum(  summary(pred)$time)
>>>>
>>>
>>> Er, I think I meant to type:
>>>
>>> fit <- coxph(Surv(time, status) ~ age + ph.ecog, lung)
>>> pred <- survfit(fit)
>>>
>>>     sum(summary(pred)$surv* summary(pred)$time)/sum(  summary(pred)$surv)
>>> [1] 211.0943
>>>
>>>
>>>> (I continue to take effort to keep my postings in plain text despite
>>>> my mail-clients's efforts to match your formatted postings. It adds
>>>> to the work of responders when you post formatted questions and
>>>> responses.)
>>>>
>>>>
>>>>> Thanks again,
>>>>> Axel.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Jul 5, 2014 at 1:54 PM, David Winsemius <dwinsemius at comcast.net
>>>>>> wrote:
>>>>>
>>>>> On Jul 5, 2014, at 5:28 AM, Axel Urbiz wrote:
>>>>>
>>>>> Dear R users,
>>>>>
>>>>> My apologies for the simple question, as I'm starting to learn the
>>>>> concepts
>>>>> behind the Cox PH model. I was just experimenting with the survival
>>>>> and rms
>>>>> packages for this.
>>>>>
>>>>> I'm simply trying to obtain the expected survival time (as opposed
>>>>> to the
>>>>> probability of survival at a given time t).
>>>>>
>>>>> What does "expected survival time" actually mean? Do you want the
>>>>> median survival time?
>>>>>
>>>>>
>>>>> I can't seem to find an option
>>>>> from the "type" argument in the predict methods from
>>>>> coxph{survival} or
>>>>> cph{rms} that will give me expected survival times.
>>>>>
>>>>> library(rms)
>>>>> options(na.action=na.exclude) # retain NA in predictions
>>>>> fit <- coxph(Surv(time, status) ~ age + ph.ecog, lung)
>>>>> fit2 <-  cph(Surv(time, status) ~ age + ph.ecog, lung)
>>>>> head(predict(fit,type="lp"))
>>>>> head(predict(fit2,type="lp"))
>>>>>
>>>>> `predict` will return the results of the regression, i.e. the log-
>>>>> hazard ratios for each term in the RHS of the formula. What you
>>>>> want (as described in the Index for the survival package) is either
>>>>> `survfit` or `survexp`.
>>>>>
>>>>> require(survival)
>>>>> help(pack=survival)
>>>>> ?survfit
>>>>> ?survexp
>>>>> ?summary.survfit
>>>>> ?quantile.survfit   # to get the median
>>>>> ?print.summary.survfit
>>>>>
>>>>> require(rms)
>>>>> help(pack=rms)
>>>>>
>>>>> The rms-package also adds a `survfit.cph` function but I have found
>>>>> the `survest` function also provides useful added features, beyond
>>>>> those offered by survfit
>>>>>
>>>>>
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Regards,
>>>>> Axel.
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> This is a plain text mailing list.
>>>>>
>>>>> --
>>>>>
>>>>> David Winsemius, MD
>>>>> Alameda, CA, USA
>>>>>
>>>>>
>>>>
>>>> David Winsemius, MD
>>>> Alameda, CA, USA
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> David Winsemius, MD
>>> Alameda, CA, USA
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list