[R] Survival analysis with no events in one treatment group
Terry Therneau
therneau at mayo.edu
Mon Dec 31 16:29:18 CET 2007
John,
The key issue with a data set that has no events in one group is that the
usual Wald tests, i.e., beta/se(beta), do not work. They is based on a Taylor
series argument of the usual type: f(x) = f(x0) + a polynomial in (x-x0), and
for infinite beta "x" and "x0" are so far apart that the approximation isn't
even close. The score and likelihood ratio statistics are just fine, however.
So if you completely ignore any printout that involves the "infinite beta"
columns of the estimated variance matrix all should be well.
In your case, a second issue is that the likelihood ratio test is not valid
for data sets with multiple events per subject. You have to account for the
within subject correlation in some way, either by moving to a random effects
model or a gee type variance. You have done the latter by adding "cluster" to
your coxph call. Thus, only the "robust score test" line of your final output
is reliable. The LR test is tainted by multiple events, and the robust Wald by
the infinite beta. Use summary() to print all the test statistics.
Assume you have variables x1 and x2 in a model, and want to test the
importance of adding x3. Stepwise regression programs often use score tests for
this, but most users have never done it and don't know how.
> fit1 <- coxph(Surv(time1, time2, event) ~ x1 + x2 + cluster(subject),
subset= (!is.na(x3)))
> fit2 <- coxph(Surv(time1, time2, event) ~ x1 + x2 + x3 +
cluster(subject), init=c(coef(fit1), 0))
Now the "robust score test" in fit2 is a test of (coef1, coef2, 0) [no need for
x3] vs the model with x3. If you have no missing values you can skip the subset
argument, but make sure that the two fits are actually on the same data set:
always check that the "n=" part of the printout matches.
Terry Therneau
Mayo Clinic
More information about the R-help
mailing list