[R] what does the it when there is a zero events in the Logistic Regression with glm?
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Nov 23 16:38:22 CET 2005
Let me repeat what I said:
> Also, please do sign your messages indicating who you are and what your
> background is. In cases like this the best advice is to suggest asking
> your supervisor (if you have one) or to read the literature (but what
> specifically depends on your background).
You have still not signed your message so I have no idea of your
background, nor does `I have read some books about logistic regression'
help me. Two accounts are
@Book{Santner.Duffy.89,
author = "T. J. Santner and D. E. Duffy",
title = "The Statistical Analysis of Discrete Data",
publisher = "Springer-Verlag",
address = "New York",
year = "1989",
ISBN = "0-387-97018-5",
comment = "Reference from MASS",
}
@Book{Ripley.96,
author = "B. D. Ripley",
title = "Pattern Recognition and Neural Networks",
publisher = "Cambridge University Press",
address = "Cambridge",
year = "1996",
ISBN = "0-521-46086-7",
comment = "Reference from MASS",
}
On Wed, 23 Nov 2005, Sh.G. Sun wrote:
> Sorry for my stupid mistakes and thanks for your reply.
>
> I just have a study on the occurrence of rare events. Although I collected
> thousands of observations, there are some groups with 0 events. I think it is
> too crude to drop those 0-events groups.
>
> I have read some books about logistic regression searched the r-help
> maillist. But I donot find enough information about "separation". Would you
> be so kind to give me some suggestions on "separation" and the "better
> algorithms"?
>
> Thanks!
>
> Sh.G. Sun
>
>
> Prof Brian Ripley wrote:
>> On Tue, 22 Nov 2005, S. Sun wrote:
>>
>>> I have a question about the glm.
>>
>> Not really: your question is about understanding logistic regressions.
>>
>>> When the events of an observation is 0,
>>> the logit function on it is Inf. I wonder how the glm solve it.
>>
>> Note that logit(0) = -Inf whereas logit(1) = Inf.
>>
>> It is the fitted probabilities which are passed to logit, not the empirical
>> proportions. Logistic regression is often applied to Bernouilli trials
>> with 0/1 proportions, with nothing to `solve'.
>>
>> So the issue only arises if the MLE would give 0 (or 1) fitted values, and
>> it cannot in a logistic regression. You have here an example in which the
>> MLE does not exist and the log-likelihood does not attain its maximum. Such
>> situations are known as `separation' and it is well-known that there are
>> better algorithms for such problems.
>>
>>> An example:
>>> Treat Events Trials
>>> A 0 50
>>> B 7 50
>>> C 10 50
>>> D 15 50
>>> E 17 50
>>>
>>> Program:
>>>
>>> treat <- factor(c("A", "B", "C", "D", "E"))
>>> events <- c(0, 7, 10, 15, 17)
>>> trials <- rep(50, 5)
>>> glm(cbind(events, trials-events)~treat, family=binomial)
>>>
>>> What's wrong with it? And are there better ideas?
>>
>> Nothing is `wrong with it'. It finds fitted values which are very close to
>> the observed values. You have chosen an inappropriate model and an
>> inappropriate parametrization (see ?relevel).
>>
>> I presume you did think something is wrong, but you did not tell us what.
>> Please do read the posting guide and try to provide us with enough
>> information to help you. Also, please do sign your messages indicating who
>> you are and what your background is. In cases like this the best advice is
>> to suggest asking your supervisor (if you have one) or to read the
>> literature (but what specifically depends on your background).
>>
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list