survdiff {survival}  R Documentation 
Test Survival Curve Differences
Description
Tests if there is a difference between two or more survival curves using
the G^\rho
family of tests, or for a single curve against a known alternative.
Usage
survdiff(formula, data, subset, na.action, rho=0, timefix=TRUE)
Arguments
formula 
a formula expression as for other survival models, of the form

data 
an optional data frame in which to interpret the variables occurring in the formula. 
subset 
expression indicating which subset of the rows of data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), a numeric vector indicating which observation numbers are to be included (or excluded if negative), or a character vector of row names to be included. All observations are included by default. 
na.action 
a missingdata filter function. This is applied to the 
rho 
a scalar parameter that controls the type of test. 
timefix 
process times through the 
Value
a list with components:
n 
the number of subjects in each group. 
obs 
the weighted observed number of events in each group. If there are strata, this will be a matrix with one column per stratum. 
exp 
the weighted expected number of events in each group. If there are strata, this will be a matrix with one column per stratum. 
chisq 
the chisquare statistic for a test of equality. 
var 
the variance matrix of the test. 
strata 
optionally, the number of subjects contained in each stratum. 
pvalue 
the pvalue corresponding to the Chisquare statistic 
Description
This function implements the Grho family of
Harrington and Fleming (1982), with weights on each death of S(t)^\rho
,
where S(t)
is the KaplanMeier estimate of survival.
With rho = 0
this is the logrank or MantelHaenszel test,
and with rho = 1
it is equivalent to the Peto & Peto modification
of the GehanWilcoxon test.
Peto and Peto show that the GehanWilcoxon test can be badly biased if the two groups have different censoring patterns, and proposed an alternative. Prentice and Marek later showed an actual example where this issue occurs. For most data sets the GehanWilcoxon and PetoPetoPrentice variant will hardly differ, however.
If the right hand side of the formula consists only of an offset term,
then a one sample test is done.
To cause missing values in the predictors to be treated as a separate
group, rather than being omitted, use the factor
function with its
exclude
argument to recode the righhandside covariate.
References
Harrington, D. P. and Fleming, T. R. (1982). A class of rank test procedures for censored survival data. Biometrika, 553566.
Peto R. Peto and Peto, J. (1972) Asymptotically efficient rank invariant test procedures (with discussion), JRSSA, 185206.
Prentice, R. and Marek, P. (1979) A qualitative discrepancy between censored data rank tests, Biometics, 861–867.
Examples
## Twosample test
survdiff(Surv(futime, fustat) ~ rx,data=ovarian)
## Stratified 7sample test
survdiff(Surv(time, status) ~ pat.karno + strata(inst), data=lung)
## Expected survival for heart transplant patients based on
## US mortality tables
expect < survexp(futime ~ 1, data=jasa, cohort=FALSE,
rmap= list(age=(accept.dt  birth.dt), sex=1, year=accept.dt),
ratetable=survexp.us)
## actual survival is much worse (no surprise)
survdiff(Surv(jasa$futime, jasa$fustat) ~ offset(expect))
# The free light chain data set is close to the population.
e2 < survexp(futime ~ 1, data=flchain, cohort=FALSE,
rmap= list(age= age*365.25, sex=sex,
year=as.Date(paste0(sample.yr, "0701"))),
ratetable= survexp.mn)
survdiff(Surv(futime, death) ~ offset(e2), flchain)