[R] Is it possible to use glm() with 30 observations?

David Firth d.firth at warwick.ac.uk
Sat Jul 2 11:01:06 CEST 2005


On 2 Jul 2005, at 06:01, Spencer Graves wrote:

> 	  The issue is not 30 observations but whether it is possible to
> perfectly separate the two possible outcomes.  Consider the following:
>
> tst.glm <- data.frame(x=1:3, y=c(0, 1, 0))
> glm(y~x, family=binomial, data=tst.glm)
>
> tst2.glm <- data.frame(x=1:1000,
>                       y=rep(0:1, each=500))
> glm(y~x, family=binomial, data=tst2.glm)
>
> 	  The algorithm fits y~x to tst.glm without complaining for tst.glm,
> but issues warnings for tst2.glm.  This is called the Hauck-Donner
> effect, and RSiteSearch("Hauck-Donner") just now produced 8 hits.  For
> more information, look for "Hauck-Donnner" in the index of Venables, W.
> N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ New
> York: Springer.

Not exactly.  The phenomenon that causes the warning for tst2.glm above 
is more commonly known as "complete separation".  For some comments on 
its implications you might look at another work by B D Ripley, the 1996 
book "Pattern Recognition and Neural Networks".  There are some further 
references in the help files of the "brlr" package on CRAN.

The problem noted by Hauck and Donner (1997, JASA) is slightly related, 
but not the same.  See the aforementioned book by Venables and Ripley, 
for example.  The glm function does not routinely warn us about the 
"Hauck-Donner effect", afaik.

The original poster did not say what was the purpose of the logistic 
regression was, so it is hard to advise.  Depending on the purpose, the 
separation that was detected may or may not be a problem.

Regards,
David

> (If you don't already have this book, I recommend you
> give serious consideration to purchasing a copy.  It is excellent on
> many issues relating to statistical analysis and R.
>
> 	  Spencer Graves
>
> Kerry Bush wrote:
>
>> I have a very simple problem. When using glm to fit
>> binary logistic regression model, sometimes I receive
>> the following warning:
>>
>> Warning messages:
>> 1: fitted probabilities numerically 0 or 1 occurred
>> in: glm.fit(x = X, y = Y, weights = weights, start =
>> start, etastart = etastart,
>> 2: fitted probabilities numerically 0 or 1 occurred
>> in: glm.fit(x = X, y = Y, weights = weights, start =
>> start, etastart = etastart,
>>
>> What does this output tell me? Since I only have 30
>> observations, i assume this is a small sample problem.
>> Is it possible to fit this model in R with only 30
>> observations? Could any expert provide suggestions to
>> avoid the warning?
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
>
> -- 
> Spencer Graves, PhD
> Senior Development Engineer
> PDF Solutions, Inc.
> 333 West San Carlos Street Suite 700
> San Jose, CA 95110, USA
>
> spencer.graves at pdf.com
> www.pdf.com <http://www.pdf.com>
> Tel:  408-938-4420
> Fax: 408-280-7915
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html




More information about the R-help mailing list