[R] Cox model approximaions (was "comparing SAS and R survival....)
Terry Therneau
therneau at mayo.edu
Fri Jul 22 14:04:15 CEST 2011
For time scale that are truly discrete Cox proposed the "exact partial
likelihood". I call that the "exact" method and SAS calls it the
"discrete" method. What we compute is precisely the same, however they
use a clever algorithm which is faster. To make things even more
confusing, Prentice introduced an "exact marginal likelihood" which is
not implemented in R, but which SAS calls the "exact" method.
Data is usually not truly discrete, however. More often ties are the
result of imprecise measurement or grouping. The Efron approximation
assumes that the data are actually continuous but we see ties because of
this; it also introduces an approximation at one point in the
calculation which greatly speeds up the computation; numerically the
approximation is very good.
In spite of the irrational love that our profession has for anything
branded with the word "exact", I currently see no reason to ever use
that particular computation in a Cox model. I'm not quite ready to
remove the option from coxph, but certainly am not going to devote any
effort toward improving that part of the code.
The Breslow approximation is less accurate, but is the easiest to
program and therefore was the only method in early Cox model programs;
it persists as the default in many software packages because of history.
Truth be told, unless the number of tied deaths is quite large the
difference in results between it and the Efron approx will be trivial.
The worst approximation, and the one that can sometimes give seriously
strange results, is to artificially remove ties from the data set by
adding a random value to each subject's time.
Terry T
--- begin quote --
I didn't know precisely the specifities of each approximation method.
I thus came back to section 3.3 of Therneau and Grambsch, Extending the
Cox
Model. I think I now see things more clearly. If I have understood
correctly, both "discrete" option and "exact" functions assume "true"
discrete event times in a model approximating the Cox model. Cox partial
likelihood cannot be exactly maximized, or even written, when there are
some
ties, am I right ?
In my sample, many of the ties (those whithin a single observation of
the
process) are due to the fact that continuous event times are grouped
into
intervals.
So I think the logistic approximation may not be the best for my problem
despite the estimate on my real data set (shown on my previous post) do
give
interessant results regarding to the context of my data set !
I was thinking about distributing the events uniformly in each interval.
What do you think about this option ? Can I expect a better
approximation
than directly applying Breslow or Efron method directly with the grouped
event data ? Finally, it becomes a model problem more than a
computationnal
or algorithmic one I guess.
More information about the R-help
mailing list