# [R] prediction based on conditional logistic regression, clogit

Therneau, Terry M., Ph.D. therneau at mayo.edu
Tue Jun 17 16:38:46 CEST 2014

```I ususally scan the digest for "surv", so missed your question on the first round.
You caught predict with a case that I never thought of; I'll look into making it smarter.

As Peter D said, the clogit function simply sets up a special data set and then calls
coxph, and is based on an identity that the likelihood for the conditional logistic is
identical to the likelihood of a Cox model for a specially structured data set.  I
vacillate on whether this identity is just a mathematical accident or the mark of some
deep truth.  However it did allow us to use existing code rather than write new routine
from scratch.

In this special data set all of the follow-up times are set to 1 (any constant value would
do).  For normal Cox regression a predicted "expected number of events" depends on the
length of follow-up for the subject, so the predict routine expects to find a follow-up
time variable in your newdata.  In my code, "rep(1, 150L)" is being mistaken for the name
of the time variable, and failure of the confused routine ensues in due course.

For the interim: in a conditional logistic the "expected number of events" is just
exp(eta)/(1 + exp(eta)) where eta is the linear predictor.  There is no follow-up time to
worry about and predict.coxph would have done way too much work, even if it hadn't gotton
confused.  Use

risk <- predict(fit, newdata=dat.test, type='risk')
risk / (1+risk)

Terry Therneau

>
>> >Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example:
>> >
>> >
>>> >>dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
>>> >>dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
>>> >>fit<-clogit(status~x1+x2+strata(set),dat)
>>> >>predict(fit,newdata=dat.test,type='expected')
>> >Error in Surv(rep(1, 150L), status) :
>> >?  Time and status are different lengths
>> >
>> >Can anyone suggest what's wrong here?
>> >

```