[R] prediction based on conditional logistic regression clogit

peter dalgaard pdalgd at gmail.com
Mon Jun 16 11:22:42 CEST 2014

On 16 Jun 2014, at 05:22 , array chip <arrayprofile at yahoo.com> wrote:

> Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example:
>> dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
>> dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
>> fit<-clogit(status~x1+x2+strata(set),dat)
>> predict(fit,newdata=dat.test,type='expected')
> Error in Surv(rep(1, 150L), status) : 
>   Time and status are different lengths
> Can anyone suggest what's wrong here?

The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. 

However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models.

I'm rusty on this, but I think what you want is something like

m <- model.matrix(~ x1 + x2 - 1, data=dat.test)
pp <- exp(m %*% coef(fit))
pps <- ave(pp, dat.test$set, FUN=sum)

i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in.


> Thanks!
> John
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-help mailing list