# [R] clogit question

Thomas Lumley tlumley at u.washington.edu
Thu Mar 23 16:37:35 CET 2006

```On Thu, 23 Mar 2006, Dan Chan wrote:

> Hi,
>
> I am playing with
>     clogit(case~spontaneous+induced+strata(stratum),data=infert)
> from clogit help file.
>
> This line works.

Yes, this is one of the nice features of R (that the examples work).

> 1. But, why strata(stratum) doesn't have a coefficient like spontaneous
> and induced?

Because that's the whole point of conditional logistic regression.  It is
used in situations where the stratum coefficients can't be estimated
reliably and where using ordinary maximum likelihood would give the wrong
answer.  When it is called conditional logistic regression it is usually
used in matched case-control studies. [Econometricians use the same
estimator but by a different name]

> 2. When I remove strata(stratum) from the command, this function seems
> to keep running forever.  Why?

It is effectively putting all the observations in the same stratum. The
computation required is exponential in the number of observations per
stratum and thus will take, to a first approximation, forever.

> 3. I think the equation for clogit looks like
> P=1/(1+ exp(-1*(a+bx+cy+.....))
> In this example, I think the spontaneous is x, induced is y.  So, b is
> the coefficient for spontaneous and c is coefficient for induced.  Where
> can I find a?

You can't.

Conditional logistic regression gives only the odds ratios, exp(b) and
exp(c).  Since the "infert" data come from a matched case-control study
you couldn't get meaningful probabilities out of them anyway.

Conditional logistic regression is a specialised and unusual estimator.
It is theoretically interesting for being a genuinely useful example of an
estimator that is consistent in a problem where the MLE is inconsistent.
Apart from that it is of interest mostly to epidemiologists.  If you
really need to (or just want to) understand it you should read up on
case-control studies.  If you are near a university with a medical school
they will have
Breslow, N. E. and N. E. Day (1980). Statistical Methods in Cancer
Research: Vol. 1 - The Analysis of Case-Control Studies. Lyon, France,
IARC Scientific Publications.
which I think is the best reference. There's probably stuff on the web,
too.

-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

```