[R] warning associated with Logistic Regression

Sun Jan 25 18:02:57 CET 2004

On Sunday, Jan 25, 2004, at 13:59 Europe/London, Guillem Chust wrote:

> Hi All,
>
> When I tried to do logistic regression (with high maximum number of
> iterations) I got the following warning message
>
> Warning message:
> fitted probabilities numerically 0 or 1 occurred in: (if
> (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
>
> As I checked from the Archive R-Help mails, it seems that this happens 
> when
> the dataset exhibits complete separation.

Yes.  correct.

> However, p-values tend to 1

The reported p-values cannot be trusted: the asymptotic theory on which 
they are based is not valid in such circumstances.

> , and
> residual deviance tends to 0.

Yes, this happens under complete separation: the model fits the 
observed 0/1 data perfectly.

> My questions then is:
> -Is the converged model correct?

Well, "converged" is not really the right word to use -- the iterative 
algorithm has diverged.  At least one of the coefficients has its MLE 
at infinity (or minus infinity).  In that sense what you see reported 
(ie large values of estimated log odds-ratios, which approximate 
infinity) is correct.  Still more correct would be estimates reported 
as Inf or -Inf: but the algorithm is not programmed to detect such 
divergence.

> or
> -Can I limit the number of iterations in order to avoid this warning?

Yes, probably, but this is not a sensible course of action.  The 
iterations are iterations of an algorithm to compute the MLE.  The MLE 
is not finite-valued, and the warning is a clue to that.

If you *really* want finite parameter estimates, the answer is not to 
use maximum likelihood as the method of estimation.  Various 
alternatives exist, mostly based on penalizing the likelihood [one such 
is in the brlr package, but there are others].  As a general principle 
surely it's better to maximize a different criterion (eg a penalized 
likelihood, with a purposefully chosen penalty function) rather than 
stop the MLE algorithm prematurely and arbitrarily?

I hope this helps!

David

Professor David Firth
Dept of Statistics
University of Warwick
Coventry CV4 7AL
United Kingdom

Email: d.firth at warwick.ac.uk
Voice: +44 (0)247 657 2581
Fax:   +44 (0)247 652 4532