[Rd] fitting problems in coxph.fit

Thu Dec 16 16:30:55 CET 2004

Dear Thomas & Dear List,

the fitting function `coxph.fit' called by `coxph' may fail to estimate
the regression coefficients when some values of the design matrix are very
large. For example

library(survival)

### load example data
load(url("http://www.imbe.med.uni-erlangen.de/~hothorn/coxph_fit.Rda"))
method <- "efron"

### copied from `coxph.fit'
coxfit <- .C("coxfit2", iter=as.integer(maxiter),
                   as.integer(n),
                   as.integer(nvar), stime,
                   sstat,
                   x= x[sorted,] ,
                   as.double(offset[sorted] - mean(offset)),
                   as.double(weights),
                   newstrat,
                   means= double(nvar),
                   coef= as.double(init),
                   u = double(nvar),
                   imat= double(nvar*nvar), loglik=double(2),
                   flag=integer(1),
                   double(2*n + 2*nvar*nvar + 3*nvar),
                   as.double(control$eps),
                   as.double(control$toler.chol),
                   sctest=as.double(method=="efron")
                 ,PACKAGE="survival")

produces

R> coxfit$coef
 [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

because (?)

R> summary(x[,1])
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
2.378e-01 8.758e+00 5.872e+01 1.640e+04 2.732e+02 4.000e+06

One the other hand

x[,1] <- x[,1]/max(x[,1])

coxfit <- .C("coxfit2", iter=as.integer(maxiter),
                   as.integer(n),
                   as.integer(nvar), stime,
                   sstat,
                   x= x[sorted,] ,
                   as.double(offset[sorted] - mean(offset)),
                   as.double(weights),
                   newstrat,
                   means= double(nvar),
                   coef= as.double(init),
                   u = double(nvar),
                   imat= double(nvar*nvar), loglik=double(2),
                   flag=integer(1),
                   double(2*n + 2*nvar*nvar + 3*nvar),
                   as.double(control$eps),
                   as.double(control$toler.chol),
                   sctest=as.double(method=="efron")
                 ,PACKAGE="survival")

looks much better

R> coxfit$coef
 [1]  0.25123203  0.00000000 -0.42595541 -0.04488913 -0.26061995
0.44426458
 [7] -0.38954286 -0.43081374  0.79573107  0.48234405  0.94636357
-0.25193465
[13]  1.15619712  0.32651765 -1.06731019 -0.24249939

I nailed the problem down to lines 261ff of `coxfit2.c' where

            zbeta = offset[person];
            for (i=0; i<nvar; i++)
                zbeta += newbeta[i]*covar[i][person];
            risk = exp(zbeta ) * weights[person];

and `zbeta' may become very large and thus `exp' returns `nan'. Of course
one could transform the independent variables prior to fitting but I'm not
sure if this is the intented solution.

Best,

Torsten