[R] problems with understanding behaviour of glm

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jan 13 18:57:13 CET 2000


> From: Peter Holzer <holzer at stat.math.ethz.ch>
> Date: Thu, 13 Jan 2000 17:56:27 +0100 (MET)
> 
> I don't understand, what happens in glm in the following example (note that
> in S-Plus this example finishes with an almost perfect fit, but also 49
> warnings):

Yes, this is known as linear (complete) separation, and means your
model is not at all appropriate (and IWLS does not fit it well).
There is a linear combination of the predictors which can give
all positive observations positive sign and all negative observations
negative sign. Then no MLE exists, but the likelihood has a supremum
corresponding to giving all observations fitted probabilities zero or one.
Most books do not cover this, but Santer & Duffy (1989) (for one) does.

> > fit.small <- glm(SKR.ein.aus ~ ., family = binomial, data = daten, 
maxit=100)
> Error in (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,  : 
inner loop 2; can't correct step size

That's not what I get, but I guess the problem is the same: computations
on very large numbers are generating NaNs.  S-PLUS is much more careful
that R, and I will try to add some care to R.

> cbind(daten$SKR.ein.aus, round(fit.small$fitted,2), fit.small$residuals, 
fit.small$weights)
[...]

> It is somehow strange to me that e.g. the second observation fits almost
> perfectly (actually it is 0.99999773), but that it has such a high
> residual. Obviously this is somehow compensated by the extremely low
> weight, but I don't understand actually what happens. Are the problems due
> to the possible perfect fit as it results in S-Plus?

Yes. Actually, that is not a high residual, that is a large _working_
residual, and it is large precisely because it has been divided by the
weight.  Please do not assume that the $residual and $fitted components
are the residuals and fitted values, but use the extractor functions
provided. The help page for glm does not describe (in R) what the
components mean.  Look up what the IWLS algorithm does in the 
binomial(logit) case: it is quite simple to describe.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list