[R] Survival analysis with no events in one treatment group

Mon Dec 31 16:29:18 CET 2007

John,
   The key issue with a data set that has no events in one group is that the 
usual Wald tests, i.e., beta/se(beta), do not work.  They is based on a Taylor 
series argument of the usual type: f(x) = f(x0) + a polynomial in (x-x0), and 
for infinite beta "x" and "x0" are so far apart that the approximation isn't 
even close.  The score and likelihood ratio statistics are just fine, however. 
So if you completely ignore any printout that involves the "infinite beta" 
columns of the estimated variance matrix all should be well.

   In your case, a second issue is that the likelihood ratio test is not valid 
for data sets with multiple events per subject. You have to account for the 
within subject correlation in some way, either by moving to a random effects 
model or a gee type variance.  You have done the latter by adding "cluster" to 
your coxph call.  Thus, only the "robust score test" line of your final output 
is reliable.  The LR test is tainted by multiple events, and the robust Wald by 
the infinite beta.  Use summary() to print all the test statistics.

  Assume you have variables x1 and x2 in a model, and want to test the 
importance of adding x3.  Stepwise regression programs often use score tests for 
this, but most users have never done it and don't know how. 
   	> fit1 <- coxph(Surv(time1, time2, event) ~ x1 + x2 + cluster(subject),
   		               subset= (!is.na(x3)))
   	> fit2 <- coxph(Surv(time1, time2, event) ~ x1 + x2 + x3 + 
   		              cluster(subject), init=c(coef(fit1), 0))

Now the "robust score test" in fit2 is a test of (coef1, coef2, 0) [no need for 
x3] vs the model with x3.  If you have no missing values you can skip the subset 
argument, but make sure that the two fits are actually on the same data set: 
always check that the "n=" part of the printout matches.

	Terry Therneau
	Mayo Clinic