[R] Strange scope problem

Angelo Canty canty at math.mcmaster.ca
Thu Oct 16 01:50:47 CEST 2003


Hi,

I have come across the following problem which seems to be a scoping
issue but I'm at a loss to see why this is so or to find a good
workaround.

Suppose I have a function to get a prediction after model selection
using the step function.

step.pred <- function(dat, x0) {
  fit.model <- step(lm(y~., data=dat), trace=F)
  predict(fit.model, x0, se.fit=T)
}

This function works sometimes for example

set.seed(1)
X.1 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20))
y.1 <- 5+as.matrix(X.1[,1:2])%*%matrix(c(1,1))+rnorm(20)
Xy.1 <- data.frame(X.1,y=y.1)
x0.1 <- data.frame(x1=-1,x2=-1, x3=-1)
step.pred(Xy.1, x0.1)

$fit
[1] 3.359540

$se.fit
[1] 0.523629

$df
[1] 16

$residual.scale
[1] 1.093526

but most often it crashes as in

set.seed(2)
X.2 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20))
y.2 <- 5+as.matrix(X.2[,1:2])%*%matrix(c(1,1))+rnorm(20)
Xy.2 <- data.frame(X.2,y=y.2)
x0.2 <- data.frame(x1=-1,x2=-1, x3=-1)
step.pred(Xy.2, x0.2)
Error in model.frame.default(formula = y ~ x1 + x2, data = dat,
drop.unused.levels = TRUE) : 
        Object "dat" not found

The difference seems to be that for the first dataset, step retains
all three variables whereas for the second it drops one of them.

> step(lm(y~.,data=Xy.1), trace=F)

Call:
lm(formula = y ~ x1 + x2 + x3, data = Xy.1)

Coefficients:
(Intercept)           x1           x2           x3  
     4.8347       0.8937       1.0331      -0.4516  

> step(lm(y~.,data=Xy.2), trace=F)

Call:
lm(formula = y ~ x1 + x2, data = Xy.2)

Coefficients:
(Intercept)           x1           x2  
     5.0802       0.9763       1.1369  


One possible workaround is to explicitely assign the local variable
dat in the .GlobalEnv as in

step.pred1 <- function(dat, x0) {
  assign("dat",dat, envir=.GlobalEnv)
  fit.model <- step(lm(y~., data=dat), trace=F)
  predict(fit.model, x0, se.fit=T)
}

I don't like this method since it would overwrite anything else called
dat in .GlobalEnv.  I realize that I could give it an obscure name but
the potential for damage still remains.  Am I missing something obvious
here?  If not, is it possible to work around this problem in such a way
that .GlobalEnv does not need to be touched?

In S-Plus I would use 
assign("dat",dat, frame=1)
which works but that is not available (AFAIK) in R.  Is there
something similar that I can use?

I am using R 1.6.1 for Unix on a Sun Workstation. I know that I need
to upgrade but our sysadmin doesn't regard it as priority!  

Thanks for any help you can give for this.
Angelo

------------------------------------------------------------------
|   Angelo J. Canty                Email: cantya at mcmaster.ca     |
|   Mathematics and Statistics     Phone: (905) 525-9140 x 27079 |
|   McMaster University            Fax  : (905) 522-0935         |
|   1280 Main St. W.                                             |
|   Hamilton ON L8S 4K1                                          |




More information about the R-help mailing list