[R] warning associated with Logistic Regression
Ravi Varadhan
rvaradha at jhsph.edu
Mon Jan 26 17:40:13 CET 2004
Hi All:
I am really fascinated by the content and the depth of discussion of
this thread. This really exemplifies what I have come to love and
enjoy about the R user group - that it is not JUST an answering service
for getting help on programming issues, but also a forum for some
critical and deep thinking on fundamental statistical issues.
Kudos to the group!
Best,
Ravi.
----- Original Message -----
From: David Firth <d.firth at warwick.ac.uk>
Date: Monday, January 26, 2004 5:28 am
Subject: Re: [R] warning associated with Logistic Regression
> On Sunday, Jan 25, 2004, at 18:06 Europe/London, (Ted Harding) wrote:
>
> > On 25-Jan-04 Guillem Chust wrote:
> >> Hi All,
> >>
> >> When I tried to do logistic regression (with high maximum
> number of
> >> iterations) I got the following warning message
> >>
> >> Warning message:
> >> fitted probabilities numerically 0 or 1 occurred in: (if
> >> (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> >>
> >> As I checked from the Archive R-Help mails, it seems that this
> happens>> when the dataset exhibits complete separation.
> >
> > This is so. Indeed, there is a sense in which you are experiencing
> > unusually good fortune, since for values of your predictors in one
> > region you are perfectly predicting the 0s in your reponse, and for
> > values in another region your a perfectly predicting the 1s. What
> > better could you hope for?
> >
> > However, you would respond that this is not realistic: your
> variables> are not (in real life) such that P(Y=1|X=x) is ever
> exactly 1 or
> > exactly 0, so this perfect prediction is not realistic.
> >
> > In that case, you are somewhat stuck. The plain fact is that your
> > data (in particular the way the values of the X variables are
> > distributed)
> > are not adequate to tell you what is happening.
> >
> > There may be manipulative tricks (like penalised regression) which
> > would inhibit the logistic regression from going all the way to a
> > perfect fit; but, then, how would you know how far to let it go
> > (because it will certainly go as far in that direction as you allow
> > it to).
> >
> > The key parameter in this situation the dispersion parameter (sigma
> > in the usual notation). When you get perfect fit in a "completely
> > separated" situation, this corresponds to sigma=0. If you don't like
> > this, then there must be reasons why you want sigma>0 and this may
> > imply that you have reasons for wanting sigma to be at least s0
> (say),> or, if you are prepared to be Bayesian about it, you may
> be satisfied
> > that there is a prior distribution for sigma which would not allow
> > sigma=0, and would attach high probability to a range of sigma
> values> which you condisder to be realistic.
> >
> > Unless you have a fairly firm idea of what sort of values sigma is
> > likely to havem then you are indeed stuck because you have no reason
> > to prefer one positive value of sigma to a different positive value
> > of sigma. In that case you cannot really object if the logistic
> > regression tries to make it as small as possible!
>
> This seems arguable. Accepting that we are talking about point
> estimation (the desirability of which is of course open to
> question!!),
> then old-fashioned criteria like bias, variance and mean squared
> error
> can be used as a guide. For example, we might desire to use an
> estimation method for which the MSE of the estimated logistic
> regression coefficients (suitably standardized) is as small as
> possible; or some other such thing.
>
> The simplest case is estimation of log(pi/(1-pi)) given an
> observation
> r from binomial(n,pi). Suppose we find that r=n -- what then can
> we
> say about pi? Clearly not much if n is small, rather more if n is
> large. Better in terms of MSE than the MLE (whose MSE is
> infinite) is
> to use log(p/(1-p)), with p = (r+0.5)/(n+1). See for example Cox
> &
> Snell's book on binary data. This corresponds to penalizing the
> likelihood by the Jeffreys prior, a penalty function which has
> good
> frequentist properties also in the more general logistic
> regression
> context. References given in the brlr package give the theory and
> some
> empirical evidence. The logistf package, also on CRAN, is another
> implementation.
>
> I do not mean to imply that the Jeffreys-prior penalty will be the
> right thing for all applications -- it will not. (eg if you
> really do
> have prior information, it would be better to use it.)
>
> In general I agree wholeheartedly that it is best to get
> more/better
> data!
>
> > In the absence of such reasons,
> (cut)
>
> All good wishes,
> David
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-
> guide.html
More information about the R-help
mailing list