[R] Is it possible to use glm() with 30 observations?

Sat Jul 2 18:00:19 CEST 2005

	  I agree with Ted:  "in model-fitting terms, it is a
resounding success!"  With any data set having at least one point with a 
binomial yield of 0 or 100%, you can get this phenomenon by adding 
series of random numbers sequentially to a model.  Eventually, you will 
add enough variables that some linear combination of the "predictors" 
will provide perfect prediction for that case.  In such cases, the usual 
asymptotic normality of the parameter estimates breaks down, but you can 
still test using 2*log(likelihood ratio) being approximately chi-square 
with anova(fit1, fit2), as explained in Venables and Ripley and many 
other sources, e.g., ?anova.glm:

 > tst2.glm <- data.frame(x=1:1000,
+                      y=rep(0:1, each=500))
 > fit0 <- glm(y~1, family=binomial, data=tst2.glm)
 > fit1 <- glm(y~x, family=binomial, data=tst2.glm)
Warning messages:
1: algorithm did not converge in: glm.fit(x = X, y = Y, weights = 
weights, start = start, etastart = etastart,
2: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y 
= Y, weights = weights, start = start, etastart = etastart,
 > anova(fit0, fit1, test="Chisq")
Analysis of Deviance Table

Model 1: y ~ 1
Model 2: y ~ x
   Resid. Df Resid. Dev  Df Deviance  P(>|Chi|)
1       999     1386.3
2       998  6.764e-05   1   1386.3 1.999e-303
 >
	  This effect exposes a limit to the traditional advice for 
experimental design to test only at two levels for each experimental 
factor, and put them as far apart as possible without jeopardizing 
linearity.  Here, you want to estimate a priori the range over which the 
probability of "success" goes from, say, 20% to 80%, then double that 
range and make all sets of test conditions unique and spaced roughly 
evenly over that range.  Designing experiments with binary response is 
an extremely difficult problem.  This crude procedure should get you 
something moderately close to an optimal design.

	  Best Wishes,
	  spencer graves

(Ted Harding) wrote:
> On 02-Jul-05 Kerry Bush wrote:
> 
>>I have a very simple problem. When using glm to fit
>>binary logistic regression model, sometimes I receive
>>the following warning:
>>
>>Warning messages:
>>1: fitted probabilities numerically 0 or 1 occurred
>>in: glm.fit(x = X, y = Y, weights = weights, start =
>>start, etastart = etastart,  
>>2: fitted probabilities numerically 0 or 1 occurred
>>in: glm.fit(x = X, y = Y, weights = weights, start =
>>start, etastart = etastart,  
>>
>>What does this output tell me? Since I only have 30
>>observations, i assume this is a small sample problem.
> 
> 
> It isn't. Spencer Graves has shown clearly with two examples
> that you can get a fit with no warnings with 3 observations,
> and a fit with warnings for 1000 observations. As he says,
> it arises when you get "perfect separation" with respect to
> the linear model.
> 
> However, it may be worth expanding Spencer's explanation.
> With a single explanatory variable x (as in Spencer's examples),
> "perfect separation" occurs when y = 0 for all x <= some x0,
> and y = 1 for all x > x0.
> 
> One of the parameters in the linear model is the "scale parameter"
> (i.e. the recipsorcal of the "slope"). If you express the model
> in the form
> 
>   logit(P(Y=1;x)) = (x - mu)/sigma
> 
> then sigma is the scale parameter in question.
> 
> As sigma -> 0, P(Y=1;x) -> 0 for x < mu, and -> 1 for x > mu.
> 
> Therefore, for any value of mu between x0 (at and below which
> all y=0 in your data) and x1 (the next larger value of x, at
> and above which all y=1), letting sigma -> 0 gives a fit
> which perfectly predicts your y-values: it predicts P(Y=1) = 0,
> i.e. P(Y=0) = 1, for x < mu, and predicts P(Y=1) = 1 for x > mu;
> and this is exactly what you have observed in the data.
> 
> So it's not a disaster -- in model-fitting terms, it is a
> resounding success!
> 
> However, in real life one does not expect to be dealing with
> a situation where the outcomes are so perfectly predictable,
> and therefore one views such a result with due mistrust.
> One attributes the "perfect separation" not to perfect
> predictability, but to the possibility that, by chance,
> all the "y=0" occur at lower values of x, and all the "y=1"
> at higher values of x.
> 
> 
>>Is it possible to fit this model in R with only 30
>>observations? Could any expert provide suggestions to
>>avoid the warning?
> 
> 
> Yes! As Spencer showed, it is possible with 3 -- but of
> course it depends on the outcomes y.
> 
> As to a suggestion to "avoid the warning" -- what you really
> need to avoid is data where the x-values are so sparse in
> the neighbourhood of the "P(Y=1;x) = 0.5" area that it becomes
> likely that you will get y-values showing perfect separation.
> 
> What that means in practice is that, over the range of x-values
> such that P(Y=1;x) rises from (say) 0.2 to 0.8 (chosen for
> illustration), you should have several x values in your data.
> Then the phenomonon of "perfect separation" becomes unlikely.
> 
> But what that means in real life is that you need enough data,
> over the relevant range of x-values, to enable you to obtain
> (with high probability) a non-zero estimate of sigma (i.e. an
> estimate of slope 1/sigma which is not infinite) -- i.e. that
> you have enough data, and in the right places, to estimate the
> rate at which the probability increases from low to high values.
> 
> (Theoretically, it is possible that you get "perfect separation"
> even with well-distributed x-values and sigma > 0; but with
> well-distributed x-values the chance that this would occur is so
> small that it can be neglected).
> 
> So, to come back to your particular case, the really meaningful
> suggestion for "avoiding the warning" is that you need better
> data. If your study is such that you have to take the x-values
> as they come (as opposed to a designed experiement where you
> can decide what they are to be), then this suggestion boils
> down to "get more data".
> 
> What that would mean depends on having information about the
> smallest value of sigma (largest value of slope) that is
> *plausible* in your context. Your data are not particularly
> useful, since they positively encourage adopting sigma=0.
> So objective information about this could only come from
> independent knowledge.
> 
> However, as a rule of thumb, in such a situation I would try
> to get more data until I had, say, 10 x-values roughly evenly
> distributed between the largest for which y=0 and the smallest
> for which y=1. If that didn't work first time, then repeat
> using the extended data as starting-point.
> 
> Or simply sample more data until the phenomenonof perfect
> separation was well avoided, and the S.D. of the x-coefficient
> was distinctly smaller than the value of the x-coefficient.
> 
> Hoping this helps,
> Ted.
> 
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 02-Jul-05                                       Time: 10:45:04
> ------------------------------ XFMail ------------------------------
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves at pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915