[Rd] fitting problems in coxph.fit
Torsten Hothorn
Torsten.Hothorn at rzmail.uni-erlangen.de
Thu Dec 16 16:30:55 CET 2004
Dear Thomas & Dear List,
the fitting function `coxph.fit' called by `coxph' may fail to estimate
the regression coefficients when some values of the design matrix are very
large. For example
library(survival)
### load example data
load(url("http://www.imbe.med.uni-erlangen.de/~hothorn/coxph_fit.Rda"))
method <- "efron"
### copied from `coxph.fit'
coxfit <- .C("coxfit2", iter=as.integer(maxiter),
as.integer(n),
as.integer(nvar), stime,
sstat,
x= x[sorted,] ,
as.double(offset[sorted] - mean(offset)),
as.double(weights),
newstrat,
means= double(nvar),
coef= as.double(init),
u = double(nvar),
imat= double(nvar*nvar), loglik=double(2),
flag=integer(1),
double(2*n + 2*nvar*nvar + 3*nvar),
as.double(control$eps),
as.double(control$toler.chol),
sctest=as.double(method=="efron")
,PACKAGE="survival")
produces
R> coxfit$coef
[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
because (?)
R> summary(x[,1])
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.378e-01 8.758e+00 5.872e+01 1.640e+04 2.732e+02 4.000e+06
One the other hand
x[,1] <- x[,1]/max(x[,1])
coxfit <- .C("coxfit2", iter=as.integer(maxiter),
as.integer(n),
as.integer(nvar), stime,
sstat,
x= x[sorted,] ,
as.double(offset[sorted] - mean(offset)),
as.double(weights),
newstrat,
means= double(nvar),
coef= as.double(init),
u = double(nvar),
imat= double(nvar*nvar), loglik=double(2),
flag=integer(1),
double(2*n + 2*nvar*nvar + 3*nvar),
as.double(control$eps),
as.double(control$toler.chol),
sctest=as.double(method=="efron")
,PACKAGE="survival")
looks much better
R> coxfit$coef
[1] 0.25123203 0.00000000 -0.42595541 -0.04488913 -0.26061995
0.44426458
[7] -0.38954286 -0.43081374 0.79573107 0.48234405 0.94636357
-0.25193465
[13] 1.15619712 0.32651765 -1.06731019 -0.24249939
I nailed the problem down to lines 261ff of `coxfit2.c' where
zbeta = offset[person];
for (i=0; i<nvar; i++)
zbeta += newbeta[i]*covar[i][person];
risk = exp(zbeta ) * weights[person];
and `zbeta' may become very large and thus `exp' returns `nan'. Of course
one could transform the independent variables prior to fitting but I'm not
sure if this is the intented solution.
Best,
Torsten
More information about the R-devel
mailing list