[R] Survfit when new data has only 1 row of data

Tue Nov 7 16:22:38 CET 2017

Dear R-help,

I am using R version 3.4.0 within Windows, and survival 2.41-3.  I have fit a Prentice Williams and Peterson-Counting Process model to my data as shown below.  This is basically an extension of the Cox model for interval censored data.  My dataset, bdat5 can be found here: https://drive.google.com/open?id=1sQSBEe1uBzh_gYbcj4P5Kuephvalc3gh 

cfitcp2 <- coxph(Surv(start,stop,status)~sex+rels+factor(treat)+log(age)+log(tcrate3+0.01)+cluster(trialno)+strata(enum),data=bdat5,model=TRUE,x=TRUE,y=TRUE)

I would now like to use the model to predict the probability of zero events by two years - this is equivalent to the survival probability at 2 years I believe.  This is so that I can compare the output to similar estimates obtained from negative binomial, and zero-inflated negative binomial models for the same data (albeit in a different format)

To my mind, and based on what I've read, the best way to do this is to use survfit.  I want to make predictions for each individual, therefore, I have tried this code:

trialnos <- unique(bdat5$trialno) 
prob0 <- function(ids,dataset,model,time){
		probs <- rep(0,length(ids))
		for(i in 1:length(ids)){
		print(i)
		sdata <- subset(dataset,trialno==ids[i])
		sfit <- survfit(model,newdata=sdata)
		probs[i] <-sum(summary(sfit,time)$surv)
		}
		return(probs)
		}
prob0ests <- prob0(trialnos,bdat5,cfitcp2,730)

When I do this for the first three trial numbers I get:
0.3001021 2993.4531767    0.3445589

The unusually large "probability" arises when there is only 1 row of data for the relevant trial number.  Can anyone therefore explain why there is a problem when "sdata" is only 1 row, and ideally provide a solution?

Many thanks,
Laura

Dr Laura Bonnett
NIHR Post-Doctoral Fellow

Department of Biostatistics,
Waterhouse Building, Block F,
1-5 Brownlow Street,
University of Liverpool,
Liverpool,
L69 3GL

0151 795 9686
L.J.Bonnett at liverpool.ac.uk