[R] prediction based on conditional logistic regression, clogit

Therneau, Terry M., Ph.D. therneau at mayo.edu
Tue Jun 17 16:38:46 CEST 2014


I ususally scan the digest for "surv", so missed your question on the first round.
You caught predict with a case that I never thought of; I'll look into making it smarter.

As Peter D said, the clogit function simply sets up a special data set and then calls 
coxph, and is based on an identity that the likelihood for the conditional logistic is 
identical to the likelihood of a Cox model for a specially structured data set.  I 
vacillate on whether this identity is just a mathematical accident or the mark of some 
deep truth.  However it did allow us to use existing code rather than write new routine 
from scratch.

In this special data set all of the follow-up times are set to 1 (any constant value would 
do).  For normal Cox regression a predicted "expected number of events" depends on the 
length of follow-up for the subject, so the predict routine expects to find a follow-up 
time variable in your newdata.  In my code, "rep(1, 150L)" is being mistaken for the name 
of the time variable, and failure of the confused routine ensues in due course.

For the interim: in a conditional logistic the "expected number of events" is just 
exp(eta)/(1 + exp(eta)) where eta is the linear predictor.  There is no follow-up time to 
worry about and predict.coxph would have done way too much work, even if it hadn't gotton 
confused.  Use

	risk <- predict(fit, newdata=dat.test, type='risk')
	risk / (1+risk)

Terry Therneau

>
>> >Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example:
>> >
>> >
>>> >>dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
>>> >>dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
>>> >>fit<-clogit(status~x1+x2+strata(set),dat)
>>> >>predict(fit,newdata=dat.test,type='expected')
>> >Error in Surv(rep(1, 150L), status) :
>> >?  Time and status are different lengths
>> >
>> >Can anyone suggest what's wrong here?
>> >



More information about the R-help mailing list