[R] (bug?) in survfit in R-2-15.0
Terry Therneau
therneau at mayo.edu
Thu May 17 14:59:46 CEST 2012
The predict methods are designed to work with a dataframe. In the case
of survfit(cox model) the code has, for backwards compatability
purposes, the ability to use a vector as the "newdata" argument. This
feature is not documented in the help file, on purpose, and is expected
to "go away" one day. This backwards compatatibilty works in all my
test cases, but not in the one you present. This may accelerate my
plans for removal.
The best solution for you is to use a dataframe for both the coxph
fit and the predict call. Avoid using objects that are in the global
data or are attached. Below I reprise your example in safer code. The
coxph/survfit calls both work from a temporary data frame called
"dummy". Your use of the coxph/survfit pair to get a survival curve for
glmnet is clever, by the way. I like it.
library(glmnet)
library(survival)
load(system.file("doc","VignetteExample.rdata",package="glmnet"))
fit <- with(patient.data, glmnet(x[-1,], Surv(time[-1], status[-1]),
family="cox", alpha=.05, maxit=10000))
max.dev.index <- which.max(fit$dev.ratio)
optimal.beta <- fit$beta[, max.dev.index]
nonzero.coef <- (optimal.beta != 0)
dummy <- with(patient.data, data.frame(time=time, status=status,
x=x[,nonzero.coef]))
coxfit <- coxph(Surv(time, status) ~ ., data=dummy, subset= -1,
iter=0, init=optimal.beta[nonzero.coef])
sfit <- survfit(coxfit, newdata=dummy[1,])
Terry Therneau
---- begin included message -------
Dear all,
I am using glmnet + survival and due to the latest
release of glmnet 1.7.4 I was forced to use the latest version of R 2.15.0.
My previous version of R was 2.10.1. I changed glmnet version and R
version and when I started to get weird results I was not sure where the
bug was.
After putting glmnet back to previous version, I have
found that the bug or weird behaviour happens in survival survfit. My
script is
below in email and it is uses glmnet 1.7.3. I get two completely different
answers in different versions of R and in my opinion the older version of
survival package returns correct value.
in R 2-10-1 I get
?>
cat(dim(sfit$surv))
?
?>
cat(length(sfit$surv))
32
?>
?
in R-2-15-0 I get
cat(dim(sfit$surv))
62 99
?>
cat(length(sfit$surv))
6138
?>
?Kind regardsDK
library(survival)
library(glmnet)
load(system.file("doc","VignetteExample.rdata",package="glmnet"))
attach(patient.data)
trainX?????
<-x[-1,]
trainTime??
<-time[-1]
trainStatus <- status[-1]
fit <-
glmnet(trainX,Surv(trainTime,trainStatus),family="cox",alpha=0.5,maxit=10000)
max.dev.index????
<- which.max(fit$dev.ratio)
optimal.lambda <- fit$lambda[max.dev.index]
optimal.beta? <-
fit$beta[,max.dev.index] nonzero.coef <- abs(optimal.beta)>0 selectedBeta
<- optimal.beta[nonzero.coef]
selectedTrainX??
<- trainX[,nonzero.coef]
coxph.model<- coxph(Surv(trainTime,trainStatus)~
selectedTrainX,init=selectedBeta,iter=0)
selectedTestX <- x[1,nonzero.coef]
sfit<- survfit(coxph.model,newdata=selectedTestX)
cat("\ndim(sfit$surv)")
cat(dim(sfit$surv))
cat("\nlength(sfit$surv)")
cat(length(sfit$surv))
More information about the R-help
mailing list