[R] Coxph not converging with continuous variable

Mon Sep 3 19:00:09 CEST 2012

On 09/03/2012 05:00 AM, r-help-request at r-project.org wrote:
> The coxph function in R is not working for me when I use a continuous predictor in the model. Specifically, it
 > fails to converge, even when bumping up the number of max iterations 
or setting reasonable initial values.
 > The estimated Hazard ratio from the model is incorrect (verified by 
an AFT model). I've isolated it to the "x1"
 > variable in the example below, which is log-normally distributed. The 
x1 here has extreme values, but I've
 > been able to reproduce the problem intermittently with less extreme 
values. It seemed odd that I couldn't find
 > this question asked anywhere, so I'm wondering if I'm just not seeing 
a mistake I've made.
>  ....
> Alex Keil
> UNC Chapel Hill

Congratulations-- it's been a long time since someone managaed to break
the iteration code in coxph.

I used your example, but simplifed to using n=1000 and a 1 variable 
model.  The quantiles of your x1 variable are
 > round(quantile(xx, c(0, 5:10/10)),2)
    0%   50%   60%   70%   80%   90%  100%
  0.06  2.67  3.75  5.74  8.93 15.04 98.38

For a coefficient of 1 (close to the real solution) you have one subject 
with a risk of death that 999 times the average risk (he should die 
before his next heartbeat) and another with relative risk of 1.99e-40 
(an immortal).  The key components of a Cox model iteration are, it 
turns out, weighted means and variances.  For this data 99.99929 % of 
the weight falls on a single observation, i.e., at beta=1 you have an 
effective sample size of 1. The optimal coefficient is the one that best 
predicts that single subject's death time.

   Due to the computational round off error that results, the routine 
takes a step of size 1.7 from a starting estimate of 1.0 when it should 
take a stop of size of about .05, then falls into step halving to 
overcome the mistake.  Rinse and repeat.

I could possibly make coxph resistant to this data set, but at the cost 
of a major rewrite and significantly slower performance.  Can you 
convince me that this data set has any relevance?

Terry Therneau