[Rd] evaluation in transform versus within
Joris Meys
jorismeys at gmail.com
Wed Apr 1 22:38:12 CEST 2015
Thanks all, I see where I misunderstood the issue. I would like to suggest
though to add a similar warning to the help page of with() and within()
like there is already on subset() and transform().
Cheers
Joris
On Wed, Apr 1, 2015 at 9:18 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
> On 01/04/2015 2:33 PM, Joris Meys wrote:
>
>> Thank you for the insights. I understood as much from the code, but I
>> can't really see how this can cause a problem when using with() or within()
>> within a package or a function. The environments behave like I would
>> expect, as does the evaluation of the arguments. The second argument is
>> supposed to be an expression, so I would expect that expression to be
>> evaluated in the data frame first.
>>
>
> I don't know the context within which you were told that they are
> problematic, but one issue is that it makes typo detection harder, since
> the code analysis won't see typos.
>
> For example:
>
> df <- data.frame(col1 = 1)
> global <- 3
>
> with(df, col1 + global) # fine
> with(df, col1 + Global) # typo, but still no warning
>
> whereas
>
> df$col1 + global # fine
> df$col1 + Global # "no visible binding for global variable 'Global'"
>
> and of course you'll get in a real mess later with the with() code if you
> add a column named "global" to your dataframe.
>
> Duncan Murdoch
>
>
>> I believed the warning in subset() and transform() refers to the
>> consequences of using the dotted argument and the evaluation thereof inside
>> the function, but I might have misunderstood this. I've always considered
>> within() the programming equivalent of the convenience function transform().
>>
>> Sorry for using the r-devel list, but I reckoned this could have
>> consequences for package developers like me. More explicitly: if within()
>> poses the same risk as transform() (which I'm still not sure of), a warning
>> on the help page of within() would be suited imho. I will use the r-help
>> list in the future.
>>
>> Kind regards
>> Joris
>>
>> On Wed, Apr 1, 2015 at 7:55 PM, Duncan Murdoch <murdoch.duncan at gmail.com
>> <mailto:murdoch.duncan at gmail.com>> wrote:
>>
>> On 01/04/2015 1:35 PM, Gabriel Becker wrote:
>>
>> Joris,
>>
>>
>> The second argument to evalq is envir, so that line says,
>> roughly, "call
>> environment() to generate me a new environment within the
>> environment
>> defined by data".
>>
>>
>> I think that's not quite right. environment() returns the current
>> environment, it doesn't create a new one. It is evalq() that
>> created a new environment from data, and environment() just
>> returns it.
>>
>> Here's what happens. I've put the code first, the description of
>> what happens on the line below.
>>
>> parent <- parent.frame()
>>
>> Get the environment from which within.data.frame was called.
>>
>> e <- evalq(environment(), data, parent)
>>
>> Create a new environment containing the columns of data, with the
>> parent being the environment where we were called.
>> Return it and store it in e.
>>
>> eval(substitute(expr), e)
>>
>> Evaluate the expression in this new environment.
>>
>> l <- as.list(e)
>>
>> Convert it to a list.
>>
>> l <- l[!vapply(l, is.null, NA, USE.NAMES = FALSE)]
>>
>> Delete NULL entries from the list.
>>
>> nD <- length(del <- setdiff(names(data), (nl <- names(l))))
>>
>> Find out if any columns were deleted.
>>
>> data[nl] <- l
>>
>> Set the columns of data to the values from the list.
>>
>> if (nD)
>> data[del] <- if (nD == 1)
>> NULL
>> else vector("list", nD)
>> data
>>
>> Delete the columns from data which were deleted from the list.
>>
>>
>>
>> Note that that is is only generating e, the environment that
>> expr will be
>> evaluated within in the next line (the call to eval). This
>> means that expr
>> is evaluated in an environment which is inside the environment
>> defined by
>> data, so you get non-standard evaluation in that symbols
>> defined in data
>> will be available to expr earlier in symbol lookup than those
>> in the
>> environment that within() was called from.
>>
>>
>> This again sounds like there are two environments created, when
>> really there's just one, but the last part is correct.
>>
>> Duncan Murdoch
>>
>>
>>
>> This is easy to confirm from the behavior of these functions:
>>
>> > df = data.frame(x = 1:10, y = rnorm(10))
>> > x = "I'm a character"
>> > mean(x)
>> [1] NA
>> Warning message:
>> In mean.default(x) : argument is not numeric or logical:
>> returning NA
>> > within(df, mean.x <- mean(x))
>> x y mean.x
>> 1 1 0.396758869 5.5
>> 2 2 0.945679050 5.5
>> 3 3 1.980039723 5.5
>> 4 4 -0.187059706 5.5
>> 5 5 0.008220067 5.5
>> 6 6 0.451175885 5.5
>> 7 7 -0.262064017 5.5
>> 8 8 -0.652301191 5.5
>> 9 9 0.673609455 5.5
>> 10 10 -0.075590905 5.5
>> > with(df, mean(x))
>> [1] 5.5
>>
>> P.S. this is probably an r-help question.
>>
>> Best,
>> ~G
>>
>>
>>
>>
>> On Wed, Apr 1, 2015 at 10:21 AM, Joris Meys
>> <jorismeys at gmail.com <mailto:jorismeys at gmail.com>> wrote:
>>
>> > Dear list members,
>> >
>> > I'm a bit confused about the evaluation of expressions using
>> with() or
>> > within() versus subset() and transform(). I always teach my
>> students to use
>> > with() and within() because of the warning mentioned in the
>> helppages of
>> > subset() and transform(). Both functions use nonstandard
>> evaluation and are
>> > to be used only interactively.
>> >
>> > I've never seen that warning on the help page of with() and
>> within(), so I
>> > assumed both functions can safely be used in functions and
>> packages. I've
>> > now been told that both functions pose the same risk as
>> subset() and
>> > transform().
>> >
>> > Looking at the source code I've noticed the extra step:
>> >
>> > e <- evalq(environment(), data, parent)
>> >
>> > which, at least according to my understanding, should ensure
>> that the
>> > functions follow the standard evaluation rules. Could
>> somebody with more
>> > knowledge than I have shed a bit of light on this issue?
>> >
>> > Thank you
>> > Joris
>> >
>> > --
>> > Joris Meys
>> > Statistical consultant
>> >
>> > Ghent University
>> > Faculty of Bioscience Engineering
>> > Department of Mathematical Modelling, Statistics and
>> Bio-Informatics
>> >
>> > tel : +32 (0)9 264 61 79 <tel:%2B32%20%280%299%20264%2061%2079>
>> > Joris.Meys at Ugent.be
>> > -------------------------------
>> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing
>> list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>>
>>
>>
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Mathematical Modelling, Statistics and Bio-Informatics
>>
>> tel : +32 (0)9 264 61 79
>> Joris.Meys at Ugent.be
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>
>
--
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics
tel : +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
[[alternative HTML version deleted]]
More information about the R-devel
mailing list