[R] obtainl survival curves for single strata
Bond, Stephen
Stephen.Bond at cibc.com
Fri Feb 1 20:22:20 CET 2013
Dr. Therneau,
You are correct about the fit:
(cph.approve <- coxph(Surv(fundterm,resp)~I(CommitAmt/1e5)+res+CommitedRate+dayflag+mth+strata(termfac), data=dfa,
subset=(HedgeDate<"2012-11-15" & p1!="FixedOpen"), method="efron"))
However the output from survfit has individuals in each column and the strata are stacked so, in order to use that I have to subset the exact rows for the strata I need even though the strata is provided in the newdata argument.
Based on the help file it seems like the function should be able to use the strata info and return a single strata per subject. I am pasting my current code, which is not fast due to calls to reshape. If sth similar can be achieved by calling the compiled code it should run much faster allowing 100k subjects to be done in 2-3 min.
To use survfit as is, I would need to achieve subsetting in a for loop (going through columns), which is even slower than reshape. In my old code I processed subjects one by one in a for loop and that was much slower than grouping all subjects from the same strata together as in the code below.
surv.approve <- survfit(cph.approve)
b1 <- c(1,cumsum(surv.approve$strata)+1)
b2 <- cumsum(surv.approve$strata)
b1 <- b1[-length(b1)]
stratbins <- data.frame(termfac=as.integer(substring(names(b2),9,9)),start=b1,finish=b2)
> stratbins
termfac start finish
1 1 1 93
2 2 94 187
3 3 188 282
4 4 283 372
5 5 373 462
6 6 463 553
7 7 554 633
8 8 634 695
9 9 696 789
strats <- sort(unique(newapp$termfac))
for (jj in strats){
cat('strata ',jj,'\n')
block <- newapp[newapp$termfac==jj,]
surv <- surv.approve$surv[stratbins[stratbins$termfac==jj,"start"]:stratbins[stratbins$termfac==jj,"finish"]]
risk <- predict(cph.approve,new=block,type="risk",ref="sample")
newsurv <- outer(surv,risk,"^")
days <- as.Date(outer(surv.approve$time[stratbins[stratbins$termfac==jj,"start"]:stratbins[stratbins$termfac==jj,"finish"]],
block$HedgeDate,"+"))
fund <- t(t(newsurv)*block$CommitAmt)
fund <- rbind(block$CommitAmt,fund)
fund <- -diff(fund)
fund <- as.data.frame(t(fund))
fund$acct <- as.integer(rownames(fund))
ncols <- ncol(fund)-1
fundlong <- reshape(fund,dir="long",varying=list(colnames(fund)[1:ncols]),idvar="acct",timevar="daycnt")
fundlong <- fundlong[order(fundlong$acct,fundlong$daycnt),]
days <- matrix(days,nrow=nrow(days))
days <- t(days)
days <- cbind(fund[,"acct"],days)
days <- as.data.frame(days)
colnames(days)[1] <- "acct"
daylong <- reshape(days,dir="long",varying=list(colnames(days)[2:(ncols+1)]),idvar="acct",timevar="dt")
daylong <- daylong[order(daylong$acct,daylong$dt),]
daylong$V2 <- as.Date(daylong$V2,origin = "1970-01-01")
fundlong$dt <- as.character(daylong$V2)
sqlSave(ch1,fundlong,"tmpfund")
sqlQuery(ch1, "insert into RRfund (acct,dt,fund,daycnt) select acct,dt,V1,daycnt from tmpfund")
sqlQuery(ch1,"drop table tmpfund")
}
Output looks like:
acct dt fund daycnt
3623963 2012-11-16 00:00:00 472.99329489343 1
3623963 2012-11-17 00:00:00 5842.48228943771 2
3623963 2012-11-18 00:00:00 7807.17102672733 3
3623963 2012-11-19 00:00:00 7244.01769700345 4
3623963 2012-11-20 00:00:00 7073.83109592798 5
3623963 2012-11-21 00:00:00 8745.91515512884 6
3623963 2012-11-22 00:00:00 9473.87152806949 7
3623963 2012-11-23 00:00:00 12627.8890422916 8
3623963 2012-11-24 00:00:00 19598.5684713097 9
3623963 2012-11-25 00:00:00 56094.5812136802 10
3623963 2012-11-26 00:00:00 25690.439183149 11
3623963 2012-11-27 00:00:00 25420.9915256714 12
3623963 2012-11-28 00:00:00 30322.3750865434 13
3623963 2012-11-29 00:00:00 18651.1136846758 14
3623963 2012-11-30 00:00:00 20291.4528067292 15
Stephen
-----Original Message-----
From: Terry Therneau [mailto:therneau at mayo.edu]
Sent: Friday, February 01, 2013 10:11 AM
To: r-help at r-project.org; Bond, Stephen
Subject: Re: obtainl survival curves for single strata
Stephen,
I can almost but not quite get my arms around what you are asking. A bit more detail
would help. Like
cph.approve = what kind of object, generated by function __
But, let me make a guess:
cph is the result of coxph, and the model has both covariates and a strata
You want predictioned survival curves, more than 1, of the type "covariates = a, b,c,
strata=1" "covariates = d,e, f, strata=2", ... for arbitrary covariates and strata.
Now, the predicted survival curves for different strata are different lengths.
The survfit.coxph routine gets around this by being verbose: it expects you to give it
covariate sets, and returns
all of the strata for each covariate. This allows it to give a compact result. You can
always do:
newpred <- survfit(cox-model-fit, newdata=something)
newpred[5,17] # survival curve for the 5th strata, covariates from the 17th row of
newdata
But, you want to put the results into a matrix, for some pre-specifed set of time points.
Take it one step further.
mytimepoints <- seq(0, 5*365, by=30) # every 30 days
z <- summary(newpred[5,17], time=mytimepoints, extend=TRUE)$surv
The summary.survfit function's "time" argument was originally written for people who only
wanted to print certain time points, but it works as well for those who only want to
extract certain ones. It correctly handles the fact that the curve is a step function.
Terry Therneau
On 02/01/2013 05:00 AM, r-help-request at r-project.org wrote:
> What is the syntax to obtain survival curves for single strata on many subjects?
>
> I have a model based on Surv(time,response) object, so there is a single row per subject and no start,stop and no switching of strata.
>
> The newdata has many subjects and each subject has a strata and the survival based on the subject risk and the subject strata is needed.
>
> If I do
>
> newpred<- survfit(cph.approve,new=newapp,se=F)
>
> I get all strata for every subject.
>
> Attempting
>
>> > newpred<- survfit(cph.approve,new=newapp,id=CertId,se=F)
> Error in survfit.coxph(cph.approve, new = newapp, id = CertId, se = F) :
> The individual option is only valid for start-stop data
>> > newpred<- survfit(cph.approve,new=newapp,indi=T,se=F)
> Error in survfit.coxph(cph.approve, new = newapp, indi = T, se = F) :
> The individual option is only valid for start-stop data
>
> Please, advise if obtaining a single strata for a basic (time,response) model is possible? Due to differing lengths of the surv for different strata this will not go in a "wide" data.frame without padding.
>
> Thanks everybody and have a great day.
>
> Stephen B
>
More information about the R-help
mailing list