[R] what does the it when there is a zero events in the Logistic Regression with glm?

Wed Nov 23 16:38:22 CET 2005

Let me repeat what I said:

> Also, please do sign your messages indicating who you are and what your 
> background is.  In cases like this the best advice is to suggest asking 
> your supervisor (if you have one) or to read the literature (but what 
> specifically depends on your background).

You have still not signed your message so I have no idea of your 
background, nor does `I have read some books about logistic regression' 
help me.  Two accounts are

@Book{Santner.Duffy.89,
   author       = "T. J. Santner and D. E. Duffy",
   title        = "The Statistical Analysis of Discrete Data",
   publisher    = "Springer-Verlag",
   address      = "New York",
   year         = "1989",
   ISBN         = "0-387-97018-5",
   comment      = "Reference from MASS",
}

@Book{Ripley.96,
   author       = "B. D. Ripley",
   title        = "Pattern Recognition and Neural Networks",
   publisher    = "Cambridge University Press",
   address      = "Cambridge",
   year         = "1996",
   ISBN         = "0-521-46086-7",
   comment      = "Reference from MASS",
}

On Wed, 23 Nov 2005, Sh.G. Sun wrote:

> Sorry for my stupid mistakes and thanks for your reply.
>
> I just have a study on the occurrence of rare events. Although I collected 
> thousands of observations, there are some groups with 0 events. I think it is 
> too crude to drop those 0-events groups.
>
> I have read some books about logistic regression searched the r-help 
> maillist. But I donot find enough information about "separation". Would you 
> be so kind to give me some suggestions on "separation" and the "better 
> algorithms"?
>
> Thanks!
>
> Sh.G. Sun
>
>
> Prof Brian Ripley wrote:
>> On Tue, 22 Nov 2005, S. Sun wrote:
>> 
>>> I have a question about the glm.
>> 
>> Not really: your question is about understanding logistic regressions.
>> 
>>> When the events of an observation is 0,
>>> the logit function on it is Inf. I wonder how the glm solve it.
>> 
>> Note that logit(0)  = -Inf whereas logit(1) = Inf.
>> 
>> It is the fitted probabilities which are passed to logit, not the empirical 
>> proportions.  Logistic regression is often applied to Bernouilli trials 
>> with 0/1 proportions, with nothing to `solve'.
>> 
>> So the issue only arises if the MLE would give 0 (or 1) fitted values, and 
>> it cannot in a logistic regression.  You have here an example in which the 
>> MLE does not exist and the log-likelihood does not attain its maximum. Such 
>> situations are known as `separation' and it is well-known that there are 
>> better algorithms for such problems.
>> 
>>> An example:
>>> Treat Events Trials
>>> A     0      50
>>> B     7      50
>>> C     10     50
>>> D     15     50
>>> E     17     50
>>> 
>>> Program:
>>> 
>>> treat <- factor(c("A", "B", "C", "D", "E"))
>>> events <- c(0, 7, 10, 15, 17)
>>> trials <- rep(50, 5)
>>> glm(cbind(events, trials-events)~treat, family=binomial)
>>> 
>>> What's wrong with it? And are there better ideas?
>> 
>> Nothing is `wrong with it'.  It finds fitted values which are very close to 
>> the observed values.  You have chosen an inappropriate model and an 
>> inappropriate parametrization (see ?relevel).
>> 
>> I presume you did think something is wrong, but you did not tell us what.
>> Please do read the posting guide and try to provide us with enough 
>> information to help you.  Also, please do sign your messages indicating who 
>> you are and what your background is.  In cases like this the best advice is 
>> to suggest asking your supervisor (if you have one) or to read the 
>> literature (but what specifically depends on your background).
>> 
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595