# [R] problems with understanding behaviour of glm

Peter Holzer holzer at stat.math.ethz.ch
Thu Jan 13 17:56:27 CET 2000

Dear R users,

I don't understand, what happens in glm in the following example (note that
in S-Plus this example finishes with an almost perfect fit, but also 49
warnings):

> fit.small <- glm(SKR.ein.aus ~ ., family = binomial, data = daten, maxit=100)
Error in (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  : inner loop 2; can't correct step size
1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
5: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
6: Step size truncated due to divergence in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
7: Step size truncated: out of bounds. in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,

> fit.small <- glm(SKR.ein.aus ~ ., family = binomial, data = daten, maxit=10)
Warning messages:
1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
5: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,

> cbind(daten\$SKR.ein.aus, round(fit.small\$fitted,2), fit.small\$residuals, fit.small\$weights)
[,1] [,2]           [,3]          [,4]
117    0 0.20 -1.2277296e+00 1.6099564e-01
118    1 1.00  1.2428952e+18 1.4712258e-42
119    1 1.00 -1.2317261e+01 2.1354563e-06
120    1 1.00  2.6068644e+19 3.3443478e-45
121    1 1.00  4.4764781e+21 1.1341627e-49
122    1 1.00  2.4878180e+10 3.6720713e-27
123    1 1.00  2.1799904e+12 4.7823256e-31
124    1 1.00  4.5198826e+33 1.1124846e-73
125    1 1.00 -3.5677823e+01 1.6804826e-15
126    1 1.00 -3.4034938e+01 3.0461116e-14
127    1 1.00  8.1565287e+21 3.4161550e-50
128    1 1.00 -3.8083599e+01 1.4619518e-15
129    1 1.00 -1.6188633e+01 1.1443040e-07
130    1 0.66  1.4948301e+00 2.2939566e-01
131    1 0.40  2.5545963e+00 2.3628997e-01
132    1 1.00  8.2667708e+10 3.3256498e-28
133    0 0.00  8.2885334e+00 7.5863272e-04
134    0 0.62 -2.6994067e+00 2.2881359e-01
135    1 1.00 -3.4141496e+01 1.2481102e-14
136    1 1.00 -1.6016066e+01 7.4169082e-07
137    0 0.00  1.0315917e+00 1.6787319e-02
138    0 0.16 -1.4051577e+00 1.1015032e-01

It is somehow strange to me that e.g. the second observation fits almost
perfectly (actually it is 0.99999773), but that it has such a high
residual. Obviously this is somehow compensated by the extremely low
weight, but I don't understand actually what happens. Are the problems due
to the possible perfect fit as it results in S-Plus?

Thanks in advance for the assistance.

Peter

PS: The data is the following:

"daten" <-
structure(list(TAGNR = c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12,
13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24), FAC.A1 = c(4.945936343,
0.757121179, 3.339849734, 0.446432497, -0.224455588, 0.786350506,
0.466671197, -3.39436178, 0.8120849, 1.167138695, -1.049067864,
0.628856055, 1.003535654, 1.970140467, 1.554729204, -1.35666524,
1.165706939, 0.872892807, -1.031981412, 0.187765613, 1.115037792,
0.800090304), FAC.A2 = c(1.051475524, 0.854196439, 1.622823744,
-0.429025728, -2.763990226, -0.676443186, -1.805533411, 3.366224497,
-0.290777149, -0.232696276, -1.508451163, -1.557580777, -1.124976767,
0.804105173, -3.015786974, -0.979273314, -3.188779666, -2.008428463,
-2.737199362, -2.623163019, -3.630011703, -3.843566947), MLDR = c(9687,
9617, 9633, 9666, 9618, 9528, 9555, 9499, 9661, 9738, 9749, 9738,
9649, 9696, 9604, 9580, 9508, 9605, 9674, 9726, 9703, 9705),
SKR.ein.aus = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 0, 1, 1, 0, 0)), .Names = c("TAGNR", "FAC.A1", "FAC.A2",
"MLDR", "SKR.ein.aus"), class = "data.frame", row.names = c("117",
"118", "119", "120", "121", "122", "123", "124", "125", "126",
"127", "128", "129", "130", "131", "132", "133", "134", "135",
"136", "137", "138"))
____________________________________________________________

Peter Holzer                      phone: + 41 1 632 46 34
Seminar fuer Statistik, LEO C14   fax:   + 41 1 632 12 28
(Leonhardstr. 27)                  <holzer at stat.math.ethz.ch>
ETH (Federal Inst. Technology)
8092 Zurich                       http://www.stat.math.ethz.ch/~holzer/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._