[R] comparing SAS and R survival analysis with time-dependent covariates

Wed Jul 20 15:02:38 CEST 2011

Let me expand a bit on Thomas's answer.
Looking more closely at your data set you have the following:

  death time         group 0    group 1
    1.5               0/4        13/13
      3               0/4         5/5
      8               4/4          0

At time 1.5 group 1 had 13 deaths out of 13 at risk, group 0 had none.
Time 8 doesn't have any impact on the fit, since only one group was at
risk the deaths are guarranteed to come from that group.  So the actual
MLE for the hazard ratio is 1/0 = infinity, 100% death rate in group 1
vs. 0% in group 0, at all the time points where the two groups can be
compared. 

Section 3.5 of Therneau and Grambsch, Extending the Cox Model, has a
picture of the log-likelihood in such a case, which very quickly
approaches an asymptote as beta goes to infinity.  Both phreg and coxph
iterate until the loglik "doesn't change anymore".  The printed solution
depends entirely on the convergence criteria, which are slightly
different in the two programs.  I chose to add a warning message.

Final note: I never use the discrete option, having found the Efron
approximation to be sufficient in every practical situation.  Partly for
that reason I have not worked very hard at optimising the code for that
case while SAS has.  If you insist on using the exact partial likelihood
then phreg will be tens to thousands of times faster: my code is O(2^d)
compute time where d=the max # of tied deaths at one time and theirs is
polynomial in d.  I doubt that coxph ever "crashes" your computer, but
it is easy to construct a data set whose compute time is in days or even
years.

Terry Therneau