[R] changes in coxph in "survival" from older version?

Terry Therneau therneau at mayo.edu
Thu May 12 15:42:09 CEST 2011


On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote:
> Hi all,
> 
> I found that the two different versions of "survival" packages, namely 2.36-5 
> vs. 2.36-8 or later, give different results for coxph function.  Please see 
> below and the data is attached.  The second one was done on Linux, but Windows 
> gave the same results.  Could you please let me know which one I should trust?
> 
> Thanks,

 In your case, neither.  Your data set has 22 events and 17 predictors;
the rule of thumb for a reliable Cox model is 10-20 events per predictor
which implies no more than 2 for your data set.  As a result, the
coefficients of your model have very wide confidence intervals, the coef
for Male for instance has se of 3.26, meaning the CI goes from 1/26 to
26 times the estimate; i.e., there is no biological meaning to the
estimate.

  Nevertheless, why did coxph give a different answer?  The later
version 2.36-9 failed to converge (20 iterations) with a final
log-likelihood of -19.94, the earlier code converges in 10 iterations to
-19.91.  In version 2.36-6 an extra check was put into the maximizer for
coxph in response to an exceptional data set which caused the routine to
fail due to overflow of the exp function; the Newton-Raphson iteration
algorithm had made a terrible guess in it's iteration path, which can
happen with all NR based search methods.  
   I put a limit on the size the linear predictor in the Cox model of
21.  The basic argument is that exp(linear-predictor) = relative risk
for a subject, and that there is not much biological meaning for risks
to be less than exp(-21) ~ 1/(population of the earh).  There is more to
the reasoning, interested parties should look at the comments in
src/coxsafe.c, a 5 line routine with 25 lines of discussion.  I will
happily accept input the "best" value for the constant.

   I never expected to see a data set with both convergence of the LL
and linear predictors larger than +-15.  Looking at the fit (older code)
> round(fit2$linear.predictor, 2)
 [1]   2.26   0.89   4.96 -19.09 -12.10   1.39   2.82   3.10
 [9]  18.57 -25.25  22.94   8.75   5.52 -27.64  14.88 -23.41
[17]  13.70 -28.45  -1.84  10.04  12.62   2.54   6.33  -8.76
[25]   9.68   4.39   2.92   3.51   6.02 -17.24   5.97

This says that, if the model is to be believed, you have several near
immortals in the data set. (Everyone else on earth will perish first).

Terry Therneau



More information about the R-help mailing list