[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

Patrick Breheny patrick.breheny at uky.edu
Fri Dec 2 15:08:18 CET 2011


On 12/01/2011 08:00 PM, Ben quant wrote:
> The data I am using is the last file called l_yx.RData at this link (the
> second file contains the plots from earlier):
> http://scientia.crescat.net/static/ben/

The logistic regression model you are fitting assumes a linear 
relationship between x and the log odds of y; that does not seem to be 
the case for your data.  To illustrate:

x <- l_yx[,"x"]
y <- l_yx[,"y"]
ind1 <- x <= .002
ind2 <- (x > .002 & x <= .0065)
ind3 <- (x > .0065 & x <= .13)
ind4 <- (x > .0065 & x <= .13)

 > summary(glm(y[ind1]~x[ind1],family=binomial))
...
Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.79174    0.02633 -106.03   <2e-16 ***
x[ind1]     354.98852   22.78190   15.58   <2e-16 ***

 > summary(glm(y[ind2]~x[ind2],family=binomial))
Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)  -2.15805    0.02966 -72.766   <2e-16 ***
x[ind2]     -59.92934    6.51650  -9.197   <2e-16 ***

 > summary(glm(y[ind3]~x[ind3],family=binomial))
...
Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.367206   0.007781 -304.22   <2e-16 ***
x[ind3]     18.104314   0.346562   52.24   <2e-16 ***

 > summary(glm(y[ind4]~x[ind4],family=binomial))
...
Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.31511    0.08549 -15.383   <2e-16 ***
x[ind4]      0.06261    0.08784   0.713    0.476

To summarize, the relationship between x and the log odds of y appears 
to vary dramatically in both magnitude and direction depending on which 
interval of x's range we're looking at.  Trying to summarize this 
complicated pattern with a single line is leading to the fitted 
probabilities near 0 and 1 you are observing (note that only 0.1% of the 
data is in region 4 above, although region 4 accounts for 99.1% of the 
range of x).

-- 
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky



More information about the R-help mailing list