[R] warning associated with Logistic Regression
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sun Jan 25 18:48:06 CET 2004
On 25 Jan 2004, Peter Dalgaard wrote:
> David Firth <d.firth at warwick.ac.uk> writes:
>
> > On Sunday, Jan 25, 2004, at 13:59 Europe/London, Guillem Chust wrote:
> >
> > > Hi All,
> > >
> > > When I tried to do logistic regression (with high maximum number of
> > > iterations) I got the following warning message
> > >
> > > Warning message:
> > > fitted probabilities numerically 0 or 1 occurred in: (if
> > > (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> > >
> > > As I checked from the Archive R-Help mails, it seems that this
> > > happens when
> > > the dataset exhibits complete separation.
> >
> > Yes. correct.
>
> Sufficient but not necessary. It can happen just by numerical roundoff
> if the effect is strong enough. (I have an example with age and
> prevalent menarche: for nearly all women this happens between the age
> of 10 and 18, so if you have a couple of 40-year olds in your data
> set, they'll get a fitted p of 1. Happens even more easily if you
> throw in a cubic term.)
It also happens with partial separation (when some but not all of the
fitted values go to 0/1). A common case is where only one case occurs for
some cell in an interaction of factors, and so can be fitted exactly.
Another example is a dataset of say 8,000 people with complete separation
but one got recorded incorrectly -- then the MLE occurs at large but
finite parameter values and cases dissimilar to the erroneous one will
have fitted probabilities very near (but not exactly) 0/1. The asymptotic
theory is valid but practically useless (the Hauck-Donner effect) in such
problems since 8,000 is a small sample.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list