# [R] glm.fit: "fitted probabilities numerically 0 or 1 occurr

(Ted Harding) Ted.Harding at manchester.ac.uk
Tue Mar 11 11:08:09 CET 2008

```On 11-Mar-08 08:58:55, Werner Wernersen wrote:
> Hi,
>
> could anyone explain to me what this warning message
> exactly means and what the consequences are?
> Is it due to the fact that there are very extreme
> observations / outliers included or what is the reason
> for it?
>
> Thanks so much,
>   Werner

What it means is exactly what it says. How it arises will
probably be some variant of the following kind of data
(I'm guessing that your application of glm() was to data
with 0/1 responses, as in a logistic regression):

X = 0.0  0.5  1.0  1.5  2.0  2.5  3.0  ...
Y = 0    0    0    1    1    1    1    ...

i.e. all the 0's occur on one side of a value (say 1.25)
of X, and all the 1's occur on the other side.

If you take a model (e.g. logistic):

P(Y=1 | X) = exp((X-a)*b)/(1 + exp((X-a)*b))

then, for any finite values of a and b, the formula will
give a value >0 for P(Y=1 | X) where X < 1.25 (i.e. where
Y=0) so P(Y=0 | X) < 1; and a value <1 for P(Y=1 | X)
where X > 1.25 (i.e. Y=1).

However, if you take say a=1.25 (a value which separates the
0's from the 1,s), and then let b -> infinity, then you will
find that

P(Y=0 | X) -> 1, P(Y=1 | X) -> 0, for X < 1.25
P(Y=0 | X) -> 0, P(Y=1 | X) -> 1, for X > 1.25

so the limit as b -> inf perfectly predicts the observed outcome.

However, the value of a is indeterminate so long as it is
between the largest X for the Y=0 observations, and the smallest
X for the Y=1 observations.

This situation cannot arise with data where the largest X for
which Y=0 is greater than the smallest X for which Y=1, e.g.

X = 0.0  0.5  1.0  1.5  2.0  2.5  3.0  ...
Y = 0    0    1    0    1    1    1    ...

The above example is a very simple example of what is called
"linear separation". It arises more generally when there are
several covariates X1, X2, ... , Xk and there is a linear
function

L = a1*X1 + a2*X2 + ... + ak*Xk

for which (with the data as observed) there is a value L0
such that

Y = 0 for all the data such that L < L0
Y = 1 for all the data such that L > L0

In particular, if ever the number of covariates (k) is greater
than (n-2), where n is the number of cases in your data, then
you have (k+1) or fewer points in k dimensions, and there will
be a k-dimensional plane (as given by L above) which will
separate the (X1,...,Xk)-points where Y=0 from the
(X1,...,Xk)-points where Y=1. Regardless of how you assign labels
"Y=0" and "Y=1" to (k+1) or fewer points, they will be linearly
separable.

Even if k < n-1, so that they are not *in general* linearly
separated, there is still a a positive probability that you
can get data for which they are linerally separated; and
then the same situation arises. This probability increases
as the number of covariates (k) increases.

What the warning message is telling you is that a perfect
fit is possible within the parametrisation of the model:
a probability P(Y=1)=0 is fitted to cases where the observed
Y = 0; and a probability P(Y=1)=1 is fitted to cases where
the observed Y = 1.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 11-Mar-08                                       Time: 10:08:04
------------------------------ XFMail ------------------------------

```