[R] warning associated with Logistic Regression

Peter Dalgaard p.dalgaard at biostat.ku.dk
Sun Jan 25 18:24:23 CET 2004


David Firth <d.firth at warwick.ac.uk> writes:

> On Sunday, Jan 25, 2004, at 13:59 Europe/London, Guillem Chust wrote:
> 
> > Hi All,
> >
> > When I tried to do logistic regression (with high maximum number of
> > iterations) I got the following warning message
> >
> > Warning message:
> > fitted probabilities numerically 0 or 1 occurred in: (if
> > (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> >
> > As I checked from the Archive R-Help mails, it seems that this
> > happens when
> > the dataset exhibits complete separation.
> 
> Yes.  correct.

Sufficient but not necessary. It can happen just by numerical roundoff
if the effect is strong enough. (I have an example with age and
prevalent menarche: for nearly all women this happens between the age
of 10 and 18, so if you have a couple of 40-year olds in your data
set, they'll get a fitted p of 1. Happens even more easily if you
throw in a cubic term.)
 
> > However, p-values tend to 1
> 
> The reported p-values cannot be trusted: the asymptotic theory on
> which they are based is not valid in such circumstances.
> 
> > , and
> > residual deviance tends to 0.

This, however, is a clear sign that the fit has diverged, and in that
case (but not necessarily otherwise) the asymptotic theory is invalid.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list