[R] warning associated with Logistic Regression

Mon Jan 26 17:40:13 CET 2004

Hi All:

I am really fascinated by the content and the depth of discussion of 
this thread.  This really exemplifies what I have come to love and 
enjoy about the R user group - that it is not JUST an answering service 
for getting help on programming issues, but also a forum for some 
critical and deep thinking on fundamental statistical issues.  

Kudos to the group!

Best,
Ravi.

----- Original Message -----
From: David Firth <d.firth at warwick.ac.uk>
Date: Monday, January 26, 2004 5:28 am
Subject: Re: [R] warning associated with Logistic Regression

> On Sunday, Jan 25, 2004, at 18:06 Europe/London, (Ted Harding) wrote:
> 
> > On 25-Jan-04 Guillem Chust wrote:
> >> Hi All,
> >>
> >> When I tried to do logistic regression (with high maximum 
> number of
> >> iterations) I got the following warning message
> >>
> >> Warning message:
> >> fitted probabilities numerically 0 or 1 occurred in: (if
> >> (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> >>
> >> As I checked from the Archive R-Help mails, it seems that this 
> happens>> when the dataset exhibits complete separation.
> >
> > This is so. Indeed, there is a sense in which you are experiencing
> > unusually good fortune, since for values of your predictors in one
> > region you are perfectly predicting the 0s in your reponse, and for
> > values in another region your a perfectly predicting the 1s. What
> > better could you hope for?
> >
> > However, you would respond that this is not realistic: your 
> variables> are not (in real life) such that P(Y=1|X=x) is ever 
> exactly 1 or
> > exactly 0, so this perfect prediction is not realistic.
> >
> > In that case, you are somewhat stuck. The plain fact is that your
> > data (in particular the way the values of the X variables are 
> > distributed)
> > are not adequate to tell you what is happening.
> >
> > There may be manipulative tricks (like penalised regression) which
> > would inhibit the logistic regression from going all the way to a
> > perfect fit; but, then, how would you know how far to let it go
> > (because it will certainly go as far in that direction as you allow
> > it to).
> >
> > The key parameter in this situation the dispersion parameter (sigma
> > in the usual notation). When you get perfect fit in a "completely
> > separated" situation, this corresponds to sigma=0. If you don't like
> > this, then there must be reasons why you want sigma>0 and this may
> > imply that you have reasons for wanting sigma to be at least s0 
> (say),> or, if you are prepared to be Bayesian about it, you may 
> be satisfied
> > that there is a prior distribution for sigma which would not allow
> > sigma=0, and would attach high probability to a range of sigma 
> values> which you condisder to be realistic.
> >
> > Unless you have a fairly firm idea of what sort of values sigma is
> > likely to havem then you are indeed stuck because you have no reason
> > to prefer one positive value of sigma to a different positive value
> > of sigma. In that case you cannot really object if the logistic
> > regression tries to make it as small as possible!
> 
> This seems arguable.  Accepting that we are talking about point 
> estimation (the desirability of which is of course open to 
> question!!), 
> then old-fashioned criteria like bias, variance and mean squared 
> error 
> can be used as a guide.  For example, we might desire to use an 
> estimation method for which the MSE of the estimated logistic 
> regression coefficients (suitably standardized) is as small as 
> possible; or some other such thing.
> 
> The simplest case is estimation of log(pi/(1-pi)) given an 
> observation 
> r from binomial(n,pi).  Suppose we find that r=n -- what then can 
> we 
> say about pi?  Clearly not much if n is small, rather more if n is 
> large.  Better in terms of MSE than the MLE (whose MSE is 
> infinite) is 
> to use log(p/(1-p)), with p = (r+0.5)/(n+1).  See for example Cox 
> & 
> Snell's book on binary data.  This corresponds to penalizing the 
> likelihood by the Jeffreys prior, a penalty function which has 
> good 
> frequentist properties also in the more general logistic 
> regression 
> context.  References given in the brlr package give the theory and 
> some 
> empirical evidence.  The logistf package, also on CRAN, is another 
> implementation.
> 
> I do not mean to imply that the Jeffreys-prior penalty will be the 
> right thing for all applications -- it will not.  (eg if you 
> really do 
> have prior information, it would be better to use it.)
> 
> In general I agree wholeheartedly that it is best to get 
> more/better 
> data!
> 
> > In the absence of such reasons,
> (cut)
> 
> All good wishes,
> David
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-
> guide.html