[Rd] scoping/non-standard evaluation issue

John Fox jfox at mcmaster.ca
Tue Jan 4 22:35:35 CET 2011


Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------

> mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed + 
+         Armed.Forces + Population + Year, data=longley)

> mod.2 <- update(mod.1, . ~ . - Year + Year)

> all.equal(mod.1, mod.2)
[1] TRUE
> 
> f <- function(mod){
+     subs <- 1:10
+     update(mod, subset=subs)
+     }
    
> f(mod.1)

Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00  

> f(mod.2)
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and mod.2, or
the formulas therein, are associated with different environments, but I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------

> f1 <- function(mod){
+     assign(".subs", 1:10, envir=.GlobalEnv)
+     on.exit(remove(".subs", envir=.GlobalEnv))
+     update(mod, subset=.subs)
+     }

> f1(mod.1)

Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00  

> f1(mod.2)

Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00  

> f2 <- function(mod){
+     env <- new.env(parent=.GlobalEnv)
+     attach(NULL)
+     on.exit(detach())
+     assign(".subs", 1:10, pos=2)
+     update(mod, subset=.subs)
+     }

> f2(mod.1)

Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00  

> f2(mod.2)

Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00  

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs in the
global environment; the problem with f2() is that .subs can be masked by a
variable in the global environment.

Is there a better approach?

Thanks,
 John

--------------------------------
John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox



More information about the R-devel mailing list