[R] What is the most useful way to detect nonlinearity in logistic regression?

Sun Dec 5 03:50:22 CET 2004

Patrick Foley <patfoley at csus.edu> writes:

> It is easy to spot response nonlinearity in normal linear models using
> plot(something.lm).
> However plot(something.glm) produces artifactual peculiarities since
> the diagnostic residuals are constrained  by the fact that y can only
> take values 0 or 1.
> What do R users find most useful in checking the linearity assumption
> of logistic regression (i.e. log-odds =a+bx)?

Well, there's basically

        - grouping
        - higher-order terms
        - smoothed residuals

A simple technique is to include a variable _both_ as a continuous
term and cut up into a factor (as in ~ age + cut(age,seq(30,70,10))).
The model that you are fitting is a bit weird but it gives you a clean
test for omitting the grouped term. A somewhat nicer variant of the
same theme is to do a linear spline (or a higher order one for that
matter) with selected knots.

Re. the smoothed residuals, you do need to be careful about the
smoother. Some of the "robust" ones will do precisely the wrong thing
in this context: You really are interested in the mean, not some
trimmed mean (which can easily amount to throwing away all your
cases...). Here's an idea:

x <- runif(500)
y <- rbinom(500,size=1,p=plogis(x))
xx <- predict(loess(resid(glm(y~x,binomial))~x),se=T)
matplot(x,cbind(xx$fit, 2*xx$se.fit, -2*xx$se.fit),pch=20)

Not sure my money isn't still on the splines, though.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907