[R] Survival analysis
Terry Therneau
therneau at mayo.edu
Mon Oct 24 16:21:13 CEST 2011
On Sun, 2011-10-23 at 12:00 +0200, r-help-request at r-project.org wrote:
> The results by the survfit routine do not agree with the
> results of
> these formulae as obtained by SAS.
>
The next question should be "is SAS correct". The answer in this case
is no.
For survival data the mean is computed as the area under S(t), the
survival curve. This is how you deal with censoring. But becasue
survival curves often don't fall all the way to zero, one must deal with
the question of how far to the right the integral should go. The help
file for print.survfit has a short discussion of three possible options
available in R; two are pretty good, the third I consider more
problematic, but it is found in some textbooks. I would rank the
approach used by SAS in fourth position and have chosen not to implement
it.
Assume a curve has its last death at time 43, but 3 others who
survive to time 59, 60 and 62 (this is the curve for your second group).
To compute the mean, SAS replaces those three subjects with 3 deaths at
time 43. So it gets a mean < 43 (surprise!), while R gives a more
sensible answer. If you had 100 subjects followed for 50 years, all
still alive but one (who died at year 2), the SAS answer would be a mean
survival of 2 years.
Terry Therneau
More information about the R-help
mailing list