[Rd] scoping/non-standard evaluation issue
John Fox
jfox at mcmaster.ca
Wed Jan 5 17:36:11 CET 2011
Dear Peter,
You hit the nail on the head: I didn't (and don't) understand why mod.1
works -- which I attributed to my imperfect understanding of non-standard
evaluation. Even if there's a bug allowing mod.1 to work, I wonder about the
consequences of fixing it. That might break a lot of code. It would seem
desirable, though, for mod.1 and mod.2 to behave the same.
Best,
John
>
> > Dear Gabor,
> >
> > I used str() to look at the two objects but missed the difference that
you
> > found. What I didn't quite understand was why one model worked but not
the
> > other when both were defined at the command prompt in the global
> > environment.
>
> I kind of suspect that the bug is that mod.1 works... I.e., I can vaguely
> make out the contours of why mod.2 is not supposed to work and if that is
> true, neither should mod.1. However, if so, something clearly needs more
> work. Possibly, some of the people who worked on implement formula
> environments may want to chime in? (It's been a while, though.)
>
> >> On Tue, Jan 4, 2011 at 4:35 PM, John Fox <jfox at mcmaster.ca> wrote:
> >>> Dear r-devel list members,
> >>>
> >>> On a couple of occasions I've encountered the issue illustrated by the
> >>> following examples:
> >>>
> >>> --------- snip -----------
> >>>
> >>>> mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed +
> >>> + Armed.Forces + Population + Year, data=longley)
> >>>
> >>>> mod.2 <- update(mod.1, . ~ . - Year + Year)
> >>>
> >>>> all.equal(mod.1, mod.2)
> >>> [1] TRUE
> >>>>
> >>>> f <- function(mod){
> >>> + subs <- 1:10
> >>> + update(mod, subset=subs)
> >>> + }
> >>>
> >>>> f(mod.1)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>> Population + Year, data = longley, subset = subs)
> >>>
> >>> Coefficients:
> >>> (Intercept) GNP.deflator GNP Unemployed Armed.Forces
> >>> 3.641e+03 8.394e-03 6.909e-02 -3.971e-03 -8.595e-03
> >>> Population Year
> >>> 1.164e+00 -1.911e+00
> >>>
> >>>> f(mod.2)
> >>> Error in eval(expr, envir, enclos) : object 'subs' not found
> >>>
> >>> --------- snip -----------
> >>>
> >>> I *almost* understand what's going -- that is, clearly mod.1 and
mod.2,
> > or
> >>> the formulas therein, are associated with different environments, but
I
> >>> don't quite see why.
> >>>
> >>> Anyway, here are two "solutions" that work, but neither is in my view
> >>> desirable:
> >>>
> >>> --------- snip -----------
> >>>
> >>>> f1 <- function(mod){
> >>> + assign(".subs", 1:10, envir=.GlobalEnv)
> >>> + on.exit(remove(".subs", envir=.GlobalEnv))
> >>> + update(mod, subset=.subs)
> >>> + }
> >>>
> >>>> f1(mod.1)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>> Population + Year, data = longley, subset = .subs)
> >>>
> >>> Coefficients:
> >>> (Intercept) GNP.deflator GNP Unemployed Armed.Forces
> >>> 3.641e+03 8.394e-03 6.909e-02 -3.971e-03 -8.595e-03
> >>> Population Year
> >>> 1.164e+00 -1.911e+00
> >>>
> >>>> f1(mod.2)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>> Population + Year, data = longley, subset = .subs)
> >>>
> >>> Coefficients:
> >>> (Intercept) GNP.deflator GNP Unemployed Armed.Forces
> >>> 3.641e+03 8.394e-03 6.909e-02 -3.971e-03 -8.595e-03
> >>> Population Year
> >>> 1.164e+00 -1.911e+00
> >>>
> >>>> f2 <- function(mod){
> >>> + env <- new.env(parent=.GlobalEnv)
> >>> + attach(NULL)
> >>> + on.exit(detach())
> >>> + assign(".subs", 1:10, pos=2)
> >>> + update(mod, subset=.subs)
> >>> + }
> >>>
> >>>> f2(mod.1)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>> Population + Year, data = longley, subset = .subs)
> >>>
> >>> Coefficients:
> >>> (Intercept) GNP.deflator GNP Unemployed Armed.Forces
> >>> 3.641e+03 8.394e-03 6.909e-02 -3.971e-03 -8.595e-03
> >>> Population Year
> >>> 1.164e+00 -1.911e+00
> >>>
> >>>> f2(mod.2)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>> Population + Year, data = longley, subset = .subs)
> >>>
> >>> Coefficients:
> >>> (Intercept) GNP.deflator GNP Unemployed Armed.Forces
> >>> 3.641e+03 8.394e-03 6.909e-02 -3.971e-03 -8.595e-03
> >>> Population Year
> >>> 1.164e+00 -1.911e+00
> >>>
> >>> --------- snip -----------
> >>>
> >>> The problem with f1() is that it will clobber a variable named .subs
in
> > the
> >>> global environment; the problem with f2() is that .subs can be masked
by
> > a
> >>> variable in the global environment.
> >>>
> >>> Is there a better approach?
> >>>
> >>
> >> I think there is something wrong with R here since the formula in the
> >> call component of mod.1 has a "call" class whereas the corresponding
> >> call component of mod.2 has "formula" class:
> >>
> >>> class(mod.1$call[[2]])
> >> [1] "call"
> >>> class(mod.2$call[[2]])
> >> [1] "formula"
> >>
> >> If we reset call[[2]] to have "call" class then it works:
> >>
> >>> mod.2a <- mod.2
> >>> mod.2a$call[[2]] <- as.call(as.list(mod.2a$call[[2]]))
> >>> f(mod.2a)
> >>
> >> Call:
> >> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >> Population + Year, data = longley, subset = subs)
> >>
> >> Coefficients:
> >> (Intercept) GNP.deflator GNP Unemployed Armed.Forces
> >> Population Year
> >> 3.641e+03 8.394e-03 6.909e-02 -3.971e-03 -8.595e-03
> >> 1.164e+00 -1.911e+00
> >>
> >>
