[R] data level for stepwise

Thomas Lumley tlumley at u.washington.edu
Tue Dec 3 22:21:03 CET 2002


On Tue, 3 Dec 2002, Constantine Frangakis wrote:

> This may be of interest to R users.
>
> The command step () for stepwise regression, which asks for
> an object like lm(formula, data=mydata), apparently is looking for
> ``mydata'' in the global environment, not the environment at which
> step() is called.

Not quite.  It's looking at the step() in the environment associated with
the model formula (which will typically be the environment where the model
was created, and often the base environment)

>			That is, when step is called
> from inside another function in which the data that step() calls has also
> been updated inside that function, step() does not use the most recently
> updated data, but instead looks outside the function. (This problem does
> not happen for the lm function). Although the problem can be solved by
> using the assign function, to avoid potential bugs it would be useful to
> know which functions like step() do this.

There is some discussion of this under "Nonstandard evaluation rules" on
http://developer.r-project.org, but it doesn't cover step(), which I'll
need to add.

You can work around this by using update() first: eg with
 data(trees)
 model<-lm(Volume~Height+Girth,data=trees)
 f<-function (i)
 {
    trees <- trees[-i, ]
    step(model)
 }
 g<-function (i)
 {
    trees <- trees[-i, ]
    model <- update(model)
    step(model)
 }

the argument to f() makes no difference, as the original `trees' data
frame is used, but the argument to g() is effective, as the local data
frame is used.


	-thomas




More information about the R-help mailing list