[R] predict.coxph and predict.survreg
David Winsemius
dwinsemius at comcast.net
Thu Nov 11 19:33:11 CET 2010
On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote:
> Thanks for the comment, James!
>
> The problem is that my initial sample (Dataset 1) is truncated. That
> means I
> only observe "time to death" for those individuals who actually died
> before
> end of my observation period. It is my understanding that this type of
> truncation creates a bias when I use a "normal" regression analysis.
> Hence
> my idea to use some form of survival model.
>
> I had another look at predict.survreg and I think the option
> "response"
> could work for me.
> When I run the following code I get ptime = 290.3648.
> I assume this means that an individual with ph.ecog=2 can be
> expected to
> life another 290.3648 days before death occurs [days is the time
> scale of
> the time variable).
It is a prediction under specific assumptions underpinning a
parametric estimate.
> Could someone confirm whether this makes sense?
You ought to confirm that it "makes sense" by comparing to your data:
reauire(Hmisc); require(survival)
<your code>
> describe(lung[lung$status==1&lung$ph.ecog==2,"time"])
lung[lung$status == 1 & lung$ph.ecog == 2, "time"]
n missing unique Mean
6 0 6 293.7
92 105 211 292 511 551
Frequency 1 1 1 1 1 1
% 17 17 17 17 17 17
> ?lung
So status==1 is a censored case and the observed times are status==2
> describe(lung[lung$status==2&lung$ph.ecog==2,"time"])
lung[lung$status == 2 & lung$ph.ecog == 2, "time"]
n missing unique Mean .05 .10 .25 .50 .
75 .90 .95
44 1 44 226.0 14.95 36.90 94.50 178.50
295.75 500.00 635.85
lowest : 11 12 13 26 30, highest: 524 533 654 707 814
And the mean time to death (in a group that had only 6 censored
individual at times from 92 to 551) was 226 and median time to death
among 44 individuals is 178 with a right skewed distribution. You need
to decide whether you want to make that particular prediction when you
know that you forced a specific distributional form on the regression
machinery by accepting the default.
>
> lfit <- survreg(Surv(time, status) ~ ph.ecog, data=lung)
> ptime <- predict(lfit, newdata=data.frame(ph.ecog=2), type='response')
>
>
>
> On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
> <james.whanger at gmail.com>wrote:
>
>> Michael,
>>
>> You are looking to compute an estimated time to death -- rather
>> than the
>> odds of death conditional upon time. Thus, you will want to use
>> "time to
>> death" as your dependent variable rather than a dichotomous outcome (
>> 0=alive, 1=death). You can accomplish this with a straight forward
>> regression analysis.
>>
>> Best,
>>
>> Jim
>>
>> On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein <haenlein at escpeurope.eu
>> >wrote:
>>
>>> Dear all,
>>>
>>> I'm struggling with predicting "expected time until death" for a
>>> coxph and
>>> survreg model.
>>>
>>> I have two datasets. Dataset 1 includes a certain number of people
>>> for
>>> which
>>> I know a vector of covariates (age, gender, etc.) and their event
>>> times
>>> (i.e., I know whether they have died and when if death occurred
>>> prior to
>>> the
>>> end of the observation period). Dataset 2 includes another set of
>>> people
>>> for
>>> which I only have the covariate vector. I would like to use
>>> Dataset 1 to
>>> calibrate either a coxph or survreg model and then use this model to
>>> determine an "expected time until death" for the individuals in
>>> Dataset 2.
>>> For example, I would like to know when a person in Dataset 2 will
>>> die,
>>> given
>>> his/ her age and gender.
>>>
>>> I checked predict.coxph and predict.survreg as well as the
>>> document "A
>>> Package for Survival Analysis in S" written by Terry M. Therneau
>>> but I
>>> have
>>> to admit that I'm a bit lost here.
>>>
>>> Could anyone give me some advice on how this could be done?
>>>
>>> Thanks very much in advance,
>>>
>>> Michael
>>>
>>>
>>>
>>> Michael Haenlein
>>> Professor of Marketing
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list