[R] Coxph not converging with continuous variable
Terry Therneau
therneau at mayo.edu
Mon Sep 3 19:00:09 CEST 2012
On 09/03/2012 05:00 AM, r-help-request at r-project.org wrote:
> The coxph function in R is not working for me when I use a continuous predictor in the model. Specifically, it
> fails to converge, even when bumping up the number of max iterations
or setting reasonable initial values.
> The estimated Hazard ratio from the model is incorrect (verified by
an AFT model). I've isolated it to the "x1"
> variable in the example below, which is log-normally distributed. The
x1 here has extreme values, but I've
> been able to reproduce the problem intermittently with less extreme
values. It seemed odd that I couldn't find
> this question asked anywhere, so I'm wondering if I'm just not seeing
a mistake I've made.
> ....
> Alex Keil
> UNC Chapel Hill
Congratulations-- it's been a long time since someone managaed to break
the iteration code in coxph.
I used your example, but simplifed to using n=1000 and a 1 variable
model. The quantiles of your x1 variable are
> round(quantile(xx, c(0, 5:10/10)),2)
0% 50% 60% 70% 80% 90% 100%
0.06 2.67 3.75 5.74 8.93 15.04 98.38
For a coefficient of 1 (close to the real solution) you have one subject
with a risk of death that 999 times the average risk (he should die
before his next heartbeat) and another with relative risk of 1.99e-40
(an immortal). The key components of a Cox model iteration are, it
turns out, weighted means and variances. For this data 99.99929 % of
the weight falls on a single observation, i.e., at beta=1 you have an
effective sample size of 1. The optimal coefficient is the one that best
predicts that single subject's death time.
Due to the computational round off error that results, the routine
takes a step of size 1.7 from a starting estimate of 1.0 when it should
take a stop of size of about .05, then falls into step halving to
overcome the mistake. Rinse and repeat.
I could possibly make coxph resistant to this data set, but at the cost
of a major rewrite and significantly slower performance. Can you
convince me that this data set has any relevance?
Terry Therneau
More information about the R-help
mailing list