[R] model selection using step

Prof Brian D Ripley ripley at stats.ox.ac.uk
Fri Feb 25 09:17:46 CET 2000

On Thu, 24 Feb 2000, Jun Yan wrote:

> I am trying to do a model selection using function step. There are 21
> independent variables. I first started from a model with all variables,
> the step function does not go anywhere, returning the full model to me. 
> Q1. Does this have to do with missing values?

Probably. As R uses na.action=na.omit by default it compares
incomparable models all too easily.

(This needs sorting out: ?m says na.omit is the default and ?glm says
na.fail is the default, but it does seem to be na.omit for both.
And anova.lm merrily compares models on different numbers of observations.)

> Q2. I refit the model after removing all rows including missing values,
> but the result still does not change. What am I missing here?

Try using the trace option. It is probably the case that dropping a single
variable does not lead to a better fit as assessed by AIC. That could be
genuine, or it could be a problem with non-comparable datasets or it could
be a bug.

> body.m <- lm(BI.PPrem ~ ., data=insure[,c(1, 19:39)])
> na <- as.numeric(attr(attr(body.m$model, "na.action"), "names"))

(Is that correct? It depends on the row names being numeric ....)

> body.m <- lm(BI.PPrem ~ ., data=insure[-na,c(1, 19:39)])
> body.step <- step(body.m, scope=list(upper = ~., lower= ~1))

You would do better to use na.omit and something like

thisdf <- na.omit(insure[,c(1, 19:39)])
body.m <- lm(BI.PPrem ~ ., data=thisdf, na.action=na.fail)

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list