[Rd] evaluation in transform versus within

Joris Meys jorismeys at gmail.com
Wed Apr 1 20:33:43 CEST 2015


Thank you for the insights. I understood as much from the code, but I can't
really see how this can cause a problem when using with() or within()
within a package or a function. The environments behave like I would
expect, as does the evaluation of the arguments. The second argument is
supposed to be an expression, so I would expect that expression to be
evaluated in the data frame first.

I believed the warning in subset() and transform() refers to the
consequences of using the dotted argument and the evaluation thereof inside
the function, but I might have misunderstood this. I've always considered
within() the programming equivalent of the convenience function
transform().

Sorry for using the r-devel list, but I reckoned this could have
consequences for package developers like me. More explicitly: if within()
poses the same risk as transform() (which I'm still not sure of), a warning
on the help page of within() would be suited imho.  I will use the r-help
list in the future.

Kind regards
Joris

On Wed, Apr 1, 2015 at 7:55 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

> On 01/04/2015 1:35 PM, Gabriel Becker wrote:
>
>> Joris,
>>
>>
>> The second argument to evalq is envir, so that line says, roughly, "call
>> environment() to generate me a new environment within the environment
>> defined by data".
>>
>
> I think that's not quite right.  environment() returns the current
> environment, it doesn't create a new one.  It is evalq() that created a new
> environment from data, and environment() just returns it.
>
> Here's what happens.  I've put the code first, the description of what
> happens on the line below.
>
>     parent <- parent.frame()
>
> Get the environment from which within.data.frame was called.
>
>     e <- evalq(environment(), data, parent)
>
> Create a new environment containing the columns of data, with the parent
> being the environment where we were called.
> Return it and store it in e.
>
>     eval(substitute(expr), e)
>
> Evaluate the expression in this new environment.
>
>     l <- as.list(e)
>
> Convert it to a list.
>
>     l <- l[!vapply(l, is.null, NA, USE.NAMES = FALSE)]
>
> Delete NULL entries from the list.
>
>     nD <- length(del <- setdiff(names(data), (nl <- names(l))))
>
> Find out if any columns were deleted.
>
>     data[nl] <- l
>
> Set the columns of data to the values from the list.
>
>     if (nD)
>         data[del] <- if (nD == 1)
>             NULL
>         else vector("list", nD)
>     data
>
> Delete the columns from data which were deleted from the list.
>
>
>
>> Note that that is is only generating e, the environment that expr will be
>> evaluated within in the next line (the call to eval). This means that expr
>> is evaluated in an environment which is inside the environment defined by
>> data, so you get non-standard evaluation in that symbols defined in data
>> will be available to expr earlier in symbol lookup than those in the
>> environment that within() was called from.
>>
>
> This again sounds like there are two environments created, when really
> there's just one, but the last part is correct.
>
> Duncan Murdoch
>
>
>
>> This is easy to confirm from the behavior of these functions:
>>
>> > df = data.frame(x = 1:10, y = rnorm(10))
>> > x = "I'm a character"
>> > mean(x)
>> [1] NA
>> Warning message:
>> In mean.default(x) : argument is not numeric or logical: returning NA
>> > within(df, mean.x <- mean(x))
>>      x            y mean.x
>> 1   1  0.396758869    5.5
>> 2   2  0.945679050    5.5
>> 3   3  1.980039723    5.5
>> 4   4 -0.187059706    5.5
>> 5   5  0.008220067    5.5
>> 6   6  0.451175885    5.5
>> 7   7 -0.262064017    5.5
>> 8   8 -0.652301191    5.5
>> 9   9  0.673609455    5.5
>> 10 10 -0.075590905    5.5
>> > with(df, mean(x))
>> [1] 5.5
>>
>> P.S. this is probably an r-help question.
>>
>> Best,
>> ~G
>>
>>
>>
>>
>> On Wed, Apr 1, 2015 at 10:21 AM, Joris Meys <jorismeys at gmail.com> wrote:
>>
>> > Dear list members,
>> >
>> > I'm a bit confused about the evaluation of expressions using with() or
>> > within() versus subset() and transform(). I always teach my students to
>> use
>> > with() and within() because of the warning mentioned in the helppages of
>> > subset() and transform(). Both functions use nonstandard evaluation and
>> are
>> > to be used only interactively.
>> >
>> > I've never seen that warning on the help page of with() and within(),
>> so I
>> > assumed both functions can safely be used in functions and packages.
>> I've
>> > now been told that both functions pose the same risk as subset() and
>> > transform().
>> >
>> > Looking at the source code I've noticed the extra step:
>> >
>> > e <- evalq(environment(), data, parent)
>> >
>> > which, at least according to my understanding, should ensure that the
>> > functions follow the standard evaluation rules. Could somebody with more
>> > knowledge than I have shed a bit of light on this issue?
>> >
>> > Thank you
>> > Joris
>> >
>> > --
>> > Joris Meys
>> > Statistical consultant
>> >
>> > Ghent University
>> > Faculty of Bioscience Engineering
>> > Department of Mathematical Modelling, Statistics and Bio-Informatics
>> >
>> > tel :  +32 (0)9 264 61 79
>> > Joris.Meys at Ugent.be
>> > -------------------------------
>> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>>
>>
>>
>


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list