survfit.coxph {survival}  R Documentation 
Computes the predicted survivor function for a Cox proportional hazards model.
## S3 method for class 'coxph'
survfit(formula, newdata,
se.fit=TRUE, conf.int=.95, individual=FALSE, stype=2, ctype,
conf.type=c("log","loglog","plain","none", "logit", "arcsin"),
censor=TRUE, start.time, id, influence=FALSE,
na.action=na.pass, type, ...)
## S3 method for class 'coxphms'
survfit(formula, newdata,
se.fit=FALSE, conf.int=.95, individual=FALSE, stype=2, ctype,
conf.type=c("log","loglog","plain","none", "logit", "arcsin"),
censor=TRUE, start.time, id, influence=FALSE,
na.action=na.pass, type, p0=NULL, ...)
formula 
A 
newdata 
a data frame with the same variable names as those that appear
in the 
se.fit 
a logical value indicating whether standard errors should be
computed. Default is 
conf.int 
the level for a twosided confidence interval on the survival curve(s). Default is 0.95. 
individual 
deprecated argument, replaced by the general

stype 
computation of the survival curve, 1=direct, 2= exponenial of the cumulative hazard. 
ctype 
whether the cumulative hazard computation should have a correction for ties, 1=no, 2=yes. 
conf.type 
One of 
censor 
if FALSE time points at which there are no events (only censoring) are not included in the result. 
id 
optional variable name of subject identifiers. If this is
present, it will be search for in the 
start.time 
optional starting time, a single numeric value.
If present the returned curve contains survival after

influence 
option to return the influence values 
na.action 
the na.action to be used on the newdata argument 
type 
older argument that encompassed 
p0 
optional, a vector of probabilities. The returned curve will be for a cohort with this mixture of starting states. Most often a single state is chosen 
... 
for future methods 
This routine produces Pr(state) curves based on a coxph
model fit. For single state models it produces the single curve for
S(t) = Pr(remain in initial state at time t), known as the survival
curve; for multistate models a matrix giving probabilities for all states.
The stype
argument states the type of estimate, and defaults
to the exponential of the cumulative hazard, better known as the Breslow
estimate. For a multistate Cox model this involves the exponential
of a matrix.
The argument stype=1
uses a nonexponential or ‘direct’
estimate. For a single endpoint coxph model the code evaluates the
KalbfleichPrentice estimate, and for a multistate model it uses an
analog of the AalenJohansen estimator. The latter approach is the
default in the mstate
package.
The ctype
option affects the estimated cumulative hazard, and
if stype=2
the estimated P(state) curves as well. If not
present it is chosen so as to be concordant with the
ties
option in the coxph
call. (For multistate
coxphms
objects only ctype=1
is currently implemented.)
Likewise
the choice between a model based and robust variance estimate for the
curve will mirror the choice made in the coxph
call,
any clustering is also inherited from the parent model.
If the newdata
argument is missing, then a curve is produced
for a single "pseudo" subject with
covariate values equal to the means
component of the fit.
The resulting curve(s) almost never make sense, but
the default remains due to an unwarranted attachment to the option shown by
some users and by other packages. Two particularly egregious examples
are factor variables and interactions. Suppose one were studying
interspecies transmission of a virus, and the data set has a factor
variable with levels ("pig", "chicken") and about equal numbers of
observations for each. The “mean” covariate level will be 0.5 –
is this a flying pig? As to interactions assume data with sex coded as 0/1,
ages ranging from 50 to 80, and a model with age*sex. The “mean”
value for the age:sex interaction term will be about 30, a value
that does not occur in the data.
Users are strongly advised to use the newdata argument.
For these reasons predictions from a multistate coxph model require
the newdata argument.
If the coxph
model contained an offset term, then the data set
in the newdata
argument should also contain that variable.
When the original model contains timedependent covariates, then the
path of that covariate through time needs to be specified in order to
obtain a predicted curve. This requires newdata
to contain
multiple lines for each hypothetical subject which gives the covariate
values, time interval, and strata for each line (a subject can change
strata), along with an id
variable
which demarks which rows belong to each subject.
The time interval must have the same (start, stop, status)
variables as the original model: although the status variable is not
used and thus can be set to a dummy value of 0 or 1, it is necessary for
the response to be recognized as a Surv
object.
Last, although predictions with a timedependent covariate path can be
useful, it is very easy to create a prediction that is senseless. Users
are encouraged to seek out a text that discusses the issue in detail.
When a model contains strata but no timedependent covariates the user of this routine has a choice. If newdata argument does not contain strata variables then the returned object will be a matrix of survival curves with one row for each strata in the model and one column for each row in newdata. (This is the historical behavior of the routine.) If newdata does contain strata variables, then the result will contain one curve per row of newdata, based on the indicated stratum of the original model. In the rare case of a model with strata by covariate interactions the strata variable must be included in newdata, the routine does not allow it to be omitted (predictions become too confusing). (Note that the model Surv(time, status) ~ age*strata(sex) expands internally to strata(sex) + age:sex; the sex variable is needed for the second term of the model.)
See survfit
for more details about the counts (number of
events, number at risk, etc.)
an object of class "survfit"
.
See survfit.object
for
details. Methods defined for survfit objects are
print
, plot
,
lines
, and points
.
If the following pair of lines is used inside of another function then
the model=TRUE
argument must be added to the coxph call:
fit < coxph(...); survfit(fit)
.
This is a consequence of the nonstandard evaluation process used by the
model.frame
function when a formula is involved.
Let \log[S(t; z)]
be the log of the survival curve
for a fixed covariate vector z
, then
\log[S(t; x)]= e^{(xz)\beta}\log[S(t; z)]
is the log of the curve for any new covariate vector x
.
There is an unfortunate tendency to refer to the reference curve with
z=0
as ‘THE’ baseline hazard. However, any z
can be used as
the reference point, and more importantly, if xz
is large the
compuation can suffer severe roundoff error. It is always safest to
provide the desired x
values directly via newdata
.
Fleming, T. H. and Harrington, D. P. (1984). Nonparametric estimation of the survival distribution in censored data. Comm. in Statistics 13, 246986.
Kalbfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data. New York:Wiley.
Link, C. L. (1984). Confidence intervals for the survival function using Cox's proportional hazards model with covariates. Biometrics 40, 601610.
Therneau T and Grambsch P (2000), Modeling Survival Data: Extending the Cox Model, SpringerVerlag.
Tsiatis, A. (1981). A large sample study of the estimate for the integrated hazard function in Cox's regression model for survival data. Annals of Statistics 9, 93108.
print.survfit
,
plot.survfit
,
lines.survfit
,
coxph
,
Surv
,
strata
.