[Rd] [External] Re: should base R have a piping operator ?

Mon Oct 7 17:04:49 CEST 2019

On Mon, 7 Oct 2019, Lionel Henry wrote:

> Hi Gabe,
>
>> There is another way the pipe could go into base R that could not be
>> done in package space and has the potential to mitigate some pretty
>> serious downsides to the pipes relating to debugging
>
> I assume you're thinking about the large stack trace of the magrittr
> pipe? You don't need a parser transformation to solve this problem
> though, the pipe could be implemented as a regular function with a
> very limited impact on the stack. And if implemented as a SPECIALSXP,
> it would be completely invisible. We've been planning to rewrite %>%
> to fix the performance and the stack print, it's just low priority.
>
> About the semantics of local evaluation that were proposed in this
> thread, I think that wouldn't be right. A native pipe should be
> consistent with other control flow constructs like `if` and `for` and
> evaluate in the current environment. In that case, the `.` binding, if
> any, would be restored to its original value in `on.exit()` (or through
> unwind-protection if implemented in C).
>

Sorry to be blunt but adding/removing a variable from a caller's
environment is a horrible design idea. Think about what happens if an
argument in a pipe stage contains a pipe. (Not completely
unreasonable, e.g. for a left_join). We already have such a design
lurking in (at least) one place in base code and it keeps biting. It's
pretty high on my list to be expunged.

If a variable is to be used it needs to be in its own
scope/environment.  There is another option, which is to rewrite the
pipe as a nested call and evaluate that in the parent frame. Not likely
to be much worse for debugging and might even be better.  Some
tinkering with these ideas is at

https://gitlab.com/luke-tierney/pipes

All that said, there is nothing that can be done with pipes that can't
be done without them. They may be the most visible aspect of the
tidyverse but they are also the least essential. I don't find them
useful, mostly because they make debugging harder and add to the
cognitive load of figuring out what is actually going on in the
evaluation process. So I don't use them in my work or my teaching (I
do mention them in teaching so students can understand them when they
see them). Many people clearly like them, and that's fine. But they
are not in any way, shape, or form essential.

I can't speak for all of R core on this, but this is how I look at the
question of inclusion in base: R core developer time is a (very)
scarce resource. Any part of that resource that is used to incorporate
and maintain in base something that can be implemented reasonably well
in a package is then not available for improving and maintaining parts
of R that have to be in base. There would need to be extremely strong
reasons for reallocating resources in this way and I just don't see
how that case can be made here.

It is certainly possible that thinking about pipes might suggest tome
useful low level primitives to add that would have to live in base and
might be useful in other contexts. Those might be worth considering.
[Some kind of 'exec()' or aving an 'exec()' or 'tailcall()' primitive
to reuse a call frame for example.]

Best,

luke

> Best,
> Lionel
>
>
>> On 6 Oct 2019, at 01:50, Gabriel Becker <gabembecker using gmail.com> wrote:
>>
>> Hi all,
>>
>> I think there's some nuance here that makes makes me agree partially with
>> each "side".
>>
>> The pipe is inarguably extremely popular. Many probably think of it as a
>> core feature of R, along with the tidyverse that (as was pointed out)
>> largely surrounds it and drives its popularity. Whether its a good or bad
>> thing that they think that doesn't change the fact that by my estimation
>> that Ant is correct that they do. BUT, I don't agree with him that that, by
>> itself, is a reason to put it in base R in the form that it exists now. For
>> the current form, there aren't really any major downsides that I see to
>> having people just use the package version.
>>
>> Sure it may be a little weird, but it doesn't ever really stop the
>> people from using it or present a significant barrier. Another major point
>> is that many (most?) base R functions are not necessarily tooled to be
>> endomorphic, which in my personal opinion is *largely* the only place that
>> the pipes are really compelling.
>>
>> That was for pipes as the exist in package space, though. There is another
>> way the pipe could go into base R that could not be done in package space
>> and has the potential to mitigate some pretty serious downsides to the
>> pipes relating to debugging, which would be to implement them in the parser.
>>
>> If
>>
>> iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
>> filter(mean_sl > 5)
>>
>>
>> were *parsed* as, for example, into
>>
>> local({
>>            . = group_by(iris, Species)
>>
>>            ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
>>
>>            filter(., mean_sl > 5)
>>       })
>>
>>
>>
>>
>> Then debuggiing (once you knew that) would be much easier but behavaior
>> would be the same as it is now. There could even be some sort of
>> step-through-pipe debugger at that point added as well for additional
>> convenience.
>>
>> There is some minor precedent for that type of transformative parsing:
>>
>>> expr = parse(text = "5 -> x")
>>
>>> expr
>>
>> expression(5 -> x)
>>
>>> expr[[1]]
>>
>> x <- 5
>>
>>
>> Though thats a much more minor transformation.
>>
>> All of that said, I believe Jim Hester (cc'ed) suggested something along
>> these lines at the RSummit a couple of years ago, and thus far R-core has
>> not shown much appetite for changing things in the parser.
>>
>> Without that changing, I'd have to say that my vote, for whatever its
>> worth, comes down on the side of pipes being fine in packages. A summary of
>> my reasoning being that it only makes sense for them to go into R itself if
>> doing so fixes an issue that cna't be fixed with them in package space.
>>
>> Best,
>> ~G
>>
>>
>>
>> On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri using gmail.com> wrote:
>>
>>> Yes but this exageration precisely misses the point.
>>>
>>> Concerning your examples:
>>>
>>> * I love fread but I think it makes a lot of subjective choices that are
>>> best associated with a package. I think it
>>> changed a lot with time and can still change, and we have great developers
>>> willing to maintain it and be reactive
>>> regarding feature requests or bug reports
>>>
>>> *.group_by() adds a class that works only (or mostly) with tidyverse verbs,
>>> that's very easy to dismiss it as an inclusion in base R.
>>>
>>> * summarize is an alternative to aggregate, that would be very confusing to
>>> have both
>>>
>>> Now to be fair to your argument we could think of other functions such as
>>> data.table::rleid() which I believe base R misses deeply,
>>> and there is nothing wrong with packaged functions making their way to base
>>> R.
>>>
>>> Maybe there's an existing list of criteria for inclusion, in base R but if
>>> not I can make one up for the sake of this discussion :) :
>>> * 1) the functionality should not already exist
>>> * 2) the function should be general enough
>>> * 3) the function should have a large amount of potential of users
>>> * 4) the function should be robust, and not require extensive maintenance
>>> * 5) the function should be stable, we shouldn't expect new features ever 2
>>> months
>>> * 6) the function should have an intuitive interface in the context of the
>>> rest ot base R
>>>
>>> I guess 1 and 6 could be held against my proposal, because :
>>> (1) everything can be done without pipes
>>> (6) They are somewhat surprising (though with explicit dots not that much,
>>> and not more surprising than say `bquote()`)
>>>
>>> In my opinion the + offset the -.
>>>
>>> I wouldn't advise taking magrittr's pipe (providing the license allows so)
>>> for instance, because it makes a lot of design choices and has a complex
>>> behavior, what I propose is 2 lines of code very unlikely to evolve or
>>> require maintenance.
>>>
>>> Antoine
>>>
>>> PS: I just receive the digest once a day so If you don't "reply all" I can
>>> only react later.
>>>
>>> Le sam. 5 oct. 2019 à 19:54, Hugh Marera <hugh.marera using gmail.com> a écrit :
>>>
>>>> I exaggerated the comparison for effect. However, it is not very
>>> difficult
>>>> to find functions in dplyr or data.table or indeed other packages that
>>> one
>>>> may wish to be in base R. Examples, for me, could include
>>>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
>>> Also,
>>>> the "popularity" of magrittr::`%>%` is mostly attributable to the
>>> tidyverse
>>>> (an advanced superset of R). Many R users don't even know that they are
>>>> installing the magrittr package.
>>>>
>>>> On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar <iucar using fedoraproject.org>
>>> wrote:
>>>>
>>>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera using gmail.com> wrote:
>>>>>>
>>>>>> How is your argument different to, say,  "Should dplyr or data.table
>>> be
>>>>>> part of base R as they are the most popular data science packages and
>>>>> they
>>>>>> are used by a large number of users?"
>>>>>
>>>>> Two packages with many features, dozens of functions and under heavy
>>>>> development to fix bugs, add new features and improve performance, vs.
>>>>> a single operator with a limited and well-defined functionality, and a
>>>>> reference implementation that hasn't changed in years (but certainly
>>>>> hackish in a way that probably could only be improved from R itself).
>>>>>
>>>>> Can't you really spot the difference?
>>>>>
>>>>> Iñaki
>>>>>
>>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu