[Rd] scoping/non-standard evaluation issue

John Fox jfox at mcmaster.ca
Wed Jan 5 17:36:11 CET 2011


Dear Peter,

You hit the nail on the head: I didn't (and don't) understand why mod.1
works -- which I attributed to my imperfect understanding of non-standard
evaluation. Even if there's a bug allowing mod.1 to work, I wonder about the
consequences of fixing it. That might break a lot of code. It would seem
desirable, though, for mod.1 and mod.2 to behave the same.

Best,
 John


> -----Original Message-----
> From: peter dalgaard [mailto:pdalgd at gmail.com]
> Sent: January-05-11 10:51 AM
> To: John Fox
> Cc: 'Gabor Grothendieck'; 'Sanford Weisberg'; r-devel at r-project.org
> Subject: Re: [Rd] scoping/non-standard evaluation issue
> 
> 
> On Jan 5, 2011, at 14:44 , John Fox wrote:
> 
> > Dear Gabor,
> >
> > I used str() to look at the two objects but missed the difference that
you
> > found. What I didn't quite understand was why one model worked but not
the
> > other when both were defined at the command prompt in the global
> > environment.
> 
> I kind of suspect that the bug is that mod.1 works... I.e., I can vaguely
> make out the  contours of why mod.2 is not supposed to work and if that is
> true, neither should mod.1. However, if so, something clearly needs more
> work. Possibly, some of the people who worked on implement formula
> environments may want to chime in? (It's been a while, though.)
> 
> >
> > Thanks,
> > John
> >
> > --------------------------------
> > John Fox
> > Senator William McMaster
> >  Professor of Social Statistics
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario, Canada
> > web: socserv.mcmaster.ca/jfox
> >
> >
> >> -----Original Message-----
> >> From: r-devel-bounces at r-project.org
[mailto:r-devel-bounces at r-project.org]
> > On
> >> Behalf Of Gabor Grothendieck
> >> Sent: January-04-11 6:56 PM
> >> To: John Fox
> >> Cc: Sanford Weisberg; r-devel at r-project.org
> >> Subject: Re: [Rd] scoping/non-standard evaluation issue
> >>
> >> On Tue, Jan 4, 2011 at 4:35 PM, John Fox <jfox at mcmaster.ca> wrote:
> >>> Dear r-devel list members,
> >>>
> >>> On a couple of occasions I've encountered the issue illustrated by the
> >>> following examples:
> >>>
> >>> --------- snip -----------
> >>>
> >>>> mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed +
> >>> +         Armed.Forces + Population + Year, data=longley)
> >>>
> >>>> mod.2 <- update(mod.1, . ~ . - Year + Year)
> >>>
> >>>> all.equal(mod.1, mod.2)
> >>> [1] TRUE
> >>>>
> >>>> f <- function(mod){
> >>> +     subs <- 1:10
> >>> +     update(mod, subset=subs)
> >>> +     }
> >>>
> >>>> f(mod.1)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>>    Population + Year, data = longley, subset = subs)
> >>>
> >>> Coefficients:
> >>>  (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
> >>>   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
> >>>  Population          Year
> >>>   1.164e+00    -1.911e+00
> >>>
> >>>> f(mod.2)
> >>> Error in eval(expr, envir, enclos) : object 'subs' not found
> >>>
> >>> --------- snip -----------
> >>>
> >>> I *almost* understand what's going -- that is, clearly mod.1 and
mod.2,
> > or
> >>> the formulas therein, are associated with different environments, but
I
> >>> don't quite see why.
> >>>
> >>> Anyway, here are two "solutions" that work, but neither is in my view
> >>> desirable:
> >>>
> >>> --------- snip -----------
> >>>
> >>>> f1 <- function(mod){
> >>> +     assign(".subs", 1:10, envir=.GlobalEnv)
> >>> +     on.exit(remove(".subs", envir=.GlobalEnv))
> >>> +     update(mod, subset=.subs)
> >>> +     }
> >>>
> >>>> f1(mod.1)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>>    Population + Year, data = longley, subset = .subs)
> >>>
> >>> Coefficients:
> >>>  (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
> >>>   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
> >>>  Population          Year
> >>>   1.164e+00    -1.911e+00
> >>>
> >>>> f1(mod.2)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>>    Population + Year, data = longley, subset = .subs)
> >>>
> >>> Coefficients:
> >>>  (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
> >>>   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
> >>>  Population          Year
> >>>   1.164e+00    -1.911e+00
> >>>
> >>>> f2 <- function(mod){
> >>> +     env <- new.env(parent=.GlobalEnv)
> >>> +     attach(NULL)
> >>> +     on.exit(detach())
> >>> +     assign(".subs", 1:10, pos=2)
> >>> +     update(mod, subset=.subs)
> >>> +     }
> >>>
> >>>> f2(mod.1)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>>    Population + Year, data = longley, subset = .subs)
> >>>
> >>> Coefficients:
> >>>  (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
> >>>   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
> >>>  Population          Year
> >>>   1.164e+00    -1.911e+00
> >>>
> >>>> f2(mod.2)
> >>>
> >>> Call:
> >>> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>>    Population + Year, data = longley, subset = .subs)
> >>>
> >>> Coefficients:
> >>>  (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
> >>>   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
> >>>  Population          Year
> >>>   1.164e+00    -1.911e+00
> >>>
> >>> --------- snip -----------
> >>>
> >>> The problem with f1() is that it will clobber a variable named .subs
in
> > the
> >>> global environment; the problem with f2() is that .subs can be masked
by
> > a
> >>> variable in the global environment.
> >>>
> >>> Is there a better approach?
> >>>
> >>
> >> I think there is something wrong with R here since the formula in the
> >> call component of mod.1 has a "call" class whereas the corresponding
> >> call component of mod.2 has "formula" class:
> >>
> >>> class(mod.1$call[[2]])
> >> [1] "call"
> >>> class(mod.2$call[[2]])
> >> [1] "formula"
> >>
> >> If we reset call[[2]] to have "call" class then it works:
> >>
> >>> mod.2a <- mod.2
> >>> mod.2a$call[[2]] <- as.call(as.list(mod.2a$call[[2]]))
> >>> f(mod.2a)
> >>
> >> Call:
> >> lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
> >>    Population + Year, data = longley, subset = subs)
> >>
> >> Coefficients:
> >> (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
> >> Population          Year
> >>   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
> >>  1.164e+00    -1.911e+00
> >>
> >>
> >> --
> >> Statistics & Software Consulting
> >> GKX Group, GKX Associates Inc.
> >> tel: 1-877-GKX-GROUP
> >> email: ggrothendieck at gmail.com
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> --
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-devel mailing list