[R] clogit question

Thomas Lumley tlumley at u.washington.edu
Thu Mar 23 16:37:35 CET 2006

On Thu, 23 Mar 2006, Dan Chan wrote:

> Hi,
> I am playing with
>     clogit(case~spontaneous+induced+strata(stratum),data=infert)
> from clogit help file.
> This line works.

Yes, this is one of the nice features of R (that the examples work).

> 1. But, why strata(stratum) doesn't have a coefficient like spontaneous
> and induced?

Because that's the whole point of conditional logistic regression.  It is 
used in situations where the stratum coefficients can't be estimated 
reliably and where using ordinary maximum likelihood would give the wrong 
answer.  When it is called conditional logistic regression it is usually 
used in matched case-control studies. [Econometricians use the same 
estimator but by a different name]

> 2. When I remove strata(stratum) from the command, this function seems
> to keep running forever.  Why?

It is effectively putting all the observations in the same stratum. The 
computation required is exponential in the number of observations per 
stratum and thus will take, to a first approximation, forever.

> 3. I think the equation for clogit looks like
> P=1/(1+ exp(-1*(a+bx+cy+.....))
> In this example, I think the spontaneous is x, induced is y.  So, b is
> the coefficient for spontaneous and c is coefficient for induced.  Where
> can I find a?

You can't.

Conditional logistic regression gives only the odds ratios, exp(b) and 
exp(c).  Since the "infert" data come from a matched case-control study 
you couldn't get meaningful probabilities out of them anyway.

Conditional logistic regression is a specialised and unusual estimator. 
It is theoretically interesting for being a genuinely useful example of an 
estimator that is consistent in a problem where the MLE is inconsistent. 
Apart from that it is of interest mostly to epidemiologists.  If you 
really need to (or just want to) understand it you should read up on 
case-control studies.  If you are near a university with a medical school 
they will have
   Breslow, N. E. and N. E. Day (1980). Statistical Methods in Cancer
   Research: Vol. 1 - The Analysis of Case-Control Studies. Lyon, France,
   IARC Scientific Publications.
which I think is the best reference. There's probably stuff on the web, 


Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

More information about the R-help mailing list