[Rd] evaluation in transform versus within

Duncan Murdoch murdoch.duncan at gmail.com
Wed Apr 1 21:18:39 CEST 2015


On 01/04/2015 2:33 PM, Joris Meys wrote:
> Thank you for the insights. I understood as much from the code, but I 
> can't really see how this can cause a problem when using with() or 
> within() within a package or a function. The environments behave like 
> I would expect, as does the evaluation of the arguments. The second 
> argument is supposed to be an expression, so I would expect that 
> expression to be evaluated in the data frame first.

I don't know the context within which you were told that they are 
problematic, but one issue is that it makes typo detection harder, since 
the code analysis won't see typos.

For example:

df <- data.frame(col1 = 1)
global <- 3

with(df, col1 + global)  # fine
with(df, col1 + Global)  # typo, but still no warning

whereas

df$col1 + global  # fine
df$col1 + Global # "no visible binding for global variable 'Global'"

and of course you'll get in a real mess later with the with() code if 
you add a column named "global" to your dataframe.

Duncan Murdoch

>
> I believed the warning in subset() and transform() refers to the 
> consequences of using the dotted argument and the evaluation thereof 
> inside the function, but I might have misunderstood this. I've always 
> considered within() the programming equivalent of the convenience 
> function transform().
>
> Sorry for using the r-devel list, but I reckoned this could have 
> consequences for package developers like me. More explicitly: if 
> within() poses the same risk as transform() (which I'm still not sure 
> of), a warning on the help page of within() would be suited imho.  I 
> will use the r-help list in the future.
>
> Kind regards
> Joris
>
> On Wed, Apr 1, 2015 at 7:55 PM, Duncan Murdoch 
> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote:
>
>     On 01/04/2015 1:35 PM, Gabriel Becker wrote:
>
>         Joris,
>
>
>         The second argument to evalq is envir, so that line says,
>         roughly, "call
>         environment() to generate me a new environment within the
>         environment
>         defined by data".
>
>
>     I think that's not quite right.  environment() returns the current
>     environment, it doesn't create a new one.  It is evalq() that
>     created a new environment from data, and environment() just
>     returns it.
>
>     Here's what happens.  I've put the code first, the description of
>     what happens on the line below.
>
>         parent <- parent.frame()
>
>     Get the environment from which within.data.frame was called.
>
>         e <- evalq(environment(), data, parent)
>
>     Create a new environment containing the columns of data, with the
>     parent being the environment where we were called.
>     Return it and store it in e.
>
>         eval(substitute(expr), e)
>
>     Evaluate the expression in this new environment.
>
>         l <- as.list(e)
>
>     Convert it to a list.
>
>         l <- l[!vapply(l, is.null, NA, USE.NAMES = FALSE)]
>
>     Delete NULL entries from the list.
>
>         nD <- length(del <- setdiff(names(data), (nl <- names(l))))
>
>     Find out if any columns were deleted.
>
>         data[nl] <- l
>
>     Set the columns of data to the values from the list.
>
>         if (nD)
>             data[del] <- if (nD == 1)
>                 NULL
>             else vector("list", nD)
>         data
>
>     Delete the columns from data which were deleted from the list.
>
>
>
>         Note that that is is only generating e, the environment that
>         expr will be
>         evaluated within in the next line (the call to eval). This
>         means that expr
>         is evaluated in an environment which is inside the environment
>         defined by
>         data, so you get non-standard evaluation in that symbols
>         defined in data
>         will be available to expr earlier in symbol lookup than those
>         in the
>         environment that within() was called from.
>
>
>     This again sounds like there are two environments created, when
>     really there's just one, but the last part is correct.
>
>     Duncan Murdoch
>
>
>
>         This is easy to confirm from the behavior of these functions:
>
>         > df = data.frame(x = 1:10, y = rnorm(10))
>         > x = "I'm a character"
>         > mean(x)
>         [1] NA
>         Warning message:
>         In mean.default(x) : argument is not numeric or logical:
>         returning NA
>         > within(df, mean.x <- mean(x))
>              x            y mean.x
>         1   1  0.396758869    5.5
>         2   2  0.945679050    5.5
>         3   3  1.980039723    5.5
>         4   4 -0.187059706    5.5
>         5   5  0.008220067    5.5
>         6   6  0.451175885    5.5
>         7   7 -0.262064017    5.5
>         8   8 -0.652301191    5.5
>         9   9  0.673609455    5.5
>         10 10 -0.075590905    5.5
>         > with(df, mean(x))
>         [1] 5.5
>
>         P.S. this is probably an r-help question.
>
>         Best,
>         ~G
>
>
>
>
>         On Wed, Apr 1, 2015 at 10:21 AM, Joris Meys
>         <jorismeys at gmail.com <mailto:jorismeys at gmail.com>> wrote:
>
>         > Dear list members,
>         >
>         > I'm a bit confused about the evaluation of expressions using
>         with() or
>         > within() versus subset() and transform(). I always teach my
>         students to use
>         > with() and within() because of the warning mentioned in the
>         helppages of
>         > subset() and transform(). Both functions use nonstandard
>         evaluation and are
>         > to be used only interactively.
>         >
>         > I've never seen that warning on the help page of with() and
>         within(), so I
>         > assumed both functions can safely be used in functions and
>         packages. I've
>         > now been told that both functions pose the same risk as
>         subset() and
>         > transform().
>         >
>         > Looking at the source code I've noticed the extra step:
>         >
>         > e <- evalq(environment(), data, parent)
>         >
>         > which, at least according to my understanding, should ensure
>         that the
>         > functions follow the standard evaluation rules. Could
>         somebody with more
>         > knowledge than I have shed a bit of light on this issue?
>         >
>         > Thank you
>         > Joris
>         >
>         > --
>         > Joris Meys
>         > Statistical consultant
>         >
>         > Ghent University
>         > Faculty of Bioscience Engineering
>         > Department of Mathematical Modelling, Statistics and
>         Bio-Informatics
>         >
>         > tel : +32 (0)9 264 61 79 <tel:%2B32%20%280%299%20264%2061%2079>
>         > Joris.Meys at Ugent.be
>         > -------------------------------
>         > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>         >
>         >         [[alternative HTML version deleted]]
>         >
>         > ______________________________________________
>         > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing
>         list
>         > https://stat.ethz.ch/mailman/listinfo/r-devel
>         >
>
>
>
>
>
>
>
> -- 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel :  +32 (0)9 264 61 79
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



More information about the R-devel mailing list