[R] problems with understanding behaviour of glm
Peter Holzer
holzer at stat.math.ethz.ch
Thu Jan 13 17:56:27 CET 2000
Dear R users,
I don't understand, what happens in glm in the following example (note that
in S-Plus this example finishes with an almost perfect fit, but also 49
warnings):
> fit.small <- glm(SKR.ein.aus ~ ., family = binomial, data = daten, maxit=100)
Error in (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, : inner loop 2; can't correct step size
In addition: Warning messages:
1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
5: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
6: Step size truncated due to divergence in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
7: Step size truncated: out of bounds. in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> fit.small <- glm(SKR.ein.aus ~ ., family = binomial, data = daten, maxit=10)
Warning messages:
1: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
2: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
3: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
4: fitted probabilities of 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
5: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> cbind(daten$SKR.ein.aus, round(fit.small$fitted,2), fit.small$residuals, fit.small$weights)
[,1] [,2] [,3] [,4]
117 0 0.20 -1.2277296e+00 1.6099564e-01
118 1 1.00 1.2428952e+18 1.4712258e-42
119 1 1.00 -1.2317261e+01 2.1354563e-06
120 1 1.00 2.6068644e+19 3.3443478e-45
121 1 1.00 4.4764781e+21 1.1341627e-49
122 1 1.00 2.4878180e+10 3.6720713e-27
123 1 1.00 2.1799904e+12 4.7823256e-31
124 1 1.00 4.5198826e+33 1.1124846e-73
125 1 1.00 -3.5677823e+01 1.6804826e-15
126 1 1.00 -3.4034938e+01 3.0461116e-14
127 1 1.00 8.1565287e+21 3.4161550e-50
128 1 1.00 -3.8083599e+01 1.4619518e-15
129 1 1.00 -1.6188633e+01 1.1443040e-07
130 1 0.66 1.4948301e+00 2.2939566e-01
131 1 0.40 2.5545963e+00 2.3628997e-01
132 1 1.00 8.2667708e+10 3.3256498e-28
133 0 0.00 8.2885334e+00 7.5863272e-04
134 0 0.62 -2.6994067e+00 2.2881359e-01
135 1 1.00 -3.4141496e+01 1.2481102e-14
136 1 1.00 -1.6016066e+01 7.4169082e-07
137 0 0.00 1.0315917e+00 1.6787319e-02
138 0 0.16 -1.4051577e+00 1.1015032e-01
It is somehow strange to me that e.g. the second observation fits almost
perfectly (actually it is 0.99999773), but that it has such a high
residual. Obviously this is somehow compensated by the extremely low
weight, but I don't understand actually what happens. Are the problems due
to the possible perfect fit as it results in S-Plus?
Thanks in advance for the assistance.
Peter
PS: The data is the following:
"daten" <-
structure(list(TAGNR = c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12,
13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24), FAC.A1 = c(4.945936343,
0.757121179, 3.339849734, 0.446432497, -0.224455588, 0.786350506,
0.466671197, -3.39436178, 0.8120849, 1.167138695, -1.049067864,
0.628856055, 1.003535654, 1.970140467, 1.554729204, -1.35666524,
1.165706939, 0.872892807, -1.031981412, 0.187765613, 1.115037792,
0.800090304), FAC.A2 = c(1.051475524, 0.854196439, 1.622823744,
-0.429025728, -2.763990226, -0.676443186, -1.805533411, 3.366224497,
-0.290777149, -0.232696276, -1.508451163, -1.557580777, -1.124976767,
0.804105173, -3.015786974, -0.979273314, -3.188779666, -2.008428463,
-2.737199362, -2.623163019, -3.630011703, -3.843566947), MLDR = c(9687,
9617, 9633, 9666, 9618, 9528, 9555, 9499, 9661, 9738, 9749, 9738,
9649, 9696, 9604, 9580, 9508, 9605, 9674, 9726, 9703, 9705),
SKR.ein.aus = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 0, 1, 1, 0, 0)), .Names = c("TAGNR", "FAC.A1", "FAC.A2",
"MLDR", "SKR.ein.aus"), class = "data.frame", row.names = c("117",
"118", "119", "120", "121", "122", "123", "124", "125", "126",
"127", "128", "129", "130", "131", "132", "133", "134", "135",
"136", "137", "138"))
____________________________________________________________
Peter Holzer phone: + 41 1 632 46 34
Seminar fuer Statistik, LEO C14 fax: + 41 1 632 12 28
(Leonhardstr. 27) <holzer at stat.math.ethz.ch>
ETH (Federal Inst. Technology)
8092 Zurich http://www.stat.math.ethz.ch/~holzer/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list