[Rd] should base R have a piping operator ?

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Mon Oct 7 13:47:41 CEST 2019


On 07/10/2019 4:22 a.m., Lionel Henry wrote:
> Hi Gabe,
> 
>> There is another way the pipe could go into base R that could not be
>> done in package space and has the potential to mitigate some pretty
>> serious downsides to the pipes relating to debugging
> 
> I assume you're thinking about the large stack trace of the magrittr
> pipe? You don't need a parser transformation to solve this problem
> though, the pipe could be implemented as a regular function with a
> very limited impact on the stack. And if implemented as a SPECIALSXP,
> it would be completely invisible. We've been planning to rewrite %>%
> to fix the performance and the stack print, it's just low priority.

I don't know what Gabe had in mind, but the downside to pipes that I see 
is that they are single statements.  I'd like the debugger to be able to 
single step through one stage at a time.  I'd like to be able to set a 
breakpoint on line 3 in

   a %>%
   b %>%
   c %>%
   d

and be able to examine the intermediate result of evaluating b before 
piping it into c.  (Or maybe that's off by one:  maybe I'd prefer to 
examine the inputs to d if I put a breakpoint there.  I'd have to try it 
to find out which feels more natural.)


> About the semantics of local evaluation that were proposed in this
> thread, I think that wouldn't be right. A native pipe should be
> consistent with other control flow constructs like `if` and `for` and
> evaluate in the current environment. In that case, the `.` binding, if
> any, would be restored to its original value in `on.exit()` (or through
> unwind-protection if implemented in C).

That makes sense.

Duncan Murdoch

> 
> Best,
> Lionel
> 
> 
>> On 6 Oct 2019, at 01:50, Gabriel Becker <gabembecker using gmail.com> wrote:
>>
>> Hi all,
>>
>> I think there's some nuance here that makes makes me agree partially with
>> each "side".
>>
>> The pipe is inarguably extremely popular. Many probably think of it as a
>> core feature of R, along with the tidyverse that (as was pointed out)
>> largely surrounds it and drives its popularity. Whether its a good or bad
>> thing that they think that doesn't change the fact that by my estimation
>> that Ant is correct that they do. BUT, I don't agree with him that that, by
>> itself, is a reason to put it in base R in the form that it exists now. For
>> the current form, there aren't really any major downsides that I see to
>> having people just use the package version.
>>
>> Sure it may be a little weird, but it doesn't ever really stop the
>> people from using it or present a significant barrier. Another major point
>> is that many (most?) base R functions are not necessarily tooled to be
>> endomorphic, which in my personal opinion is *largely* the only place that
>> the pipes are really compelling.
>>
>> That was for pipes as the exist in package space, though. There is another
>> way the pipe could go into base R that could not be done in package space
>> and has the potential to mitigate some pretty serious downsides to the
>> pipes relating to debugging, which would be to implement them in the parser.
>>
>> If
>>
>> iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
>> filter(mean_sl > 5)
>>
>>
>> were *parsed* as, for example, into
>>
>> local({
>>             . = group_by(iris, Species)
>>
>>             ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
>>
>>             filter(., mean_sl > 5)
>>        })
>>
>>
>>
>>
>> Then debuggiing (once you knew that) would be much easier but behavaior
>> would be the same as it is now. There could even be some sort of
>> step-through-pipe debugger at that point added as well for additional
>> convenience.
>>
>> There is some minor precedent for that type of transformative parsing:
>>
>>> expr = parse(text = "5 -> x")
>>
>>> expr
>>
>> expression(5 -> x)
>>
>>> expr[[1]]
>>
>> x <- 5
>>
>>
>> Though thats a much more minor transformation.
>>
>> All of that said, I believe Jim Hester (cc'ed) suggested something along
>> these lines at the RSummit a couple of years ago, and thus far R-core has
>> not shown much appetite for changing things in the parser.
>>
>> Without that changing, I'd have to say that my vote, for whatever its
>> worth, comes down on the side of pipes being fine in packages. A summary of
>> my reasoning being that it only makes sense for them to go into R itself if
>> doing so fixes an issue that cna't be fixed with them in package space.
>>
>> Best,
>> ~G
>>
>>
>>
>> On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri using gmail.com> wrote:
>>
>>> Yes but this exageration precisely misses the point.
>>>
>>> Concerning your examples:
>>>
>>> * I love fread but I think it makes a lot of subjective choices that are
>>> best associated with a package. I think it
>>> changed a lot with time and can still change, and we have great developers
>>> willing to maintain it and be reactive
>>> regarding feature requests or bug reports
>>>
>>> *.group_by() adds a class that works only (or mostly) with tidyverse verbs,
>>> that's very easy to dismiss it as an inclusion in base R.
>>>
>>> * summarize is an alternative to aggregate, that would be very confusing to
>>> have both
>>>
>>> Now to be fair to your argument we could think of other functions such as
>>> data.table::rleid() which I believe base R misses deeply,
>>> and there is nothing wrong with packaged functions making their way to base
>>> R.
>>>
>>> Maybe there's an existing list of criteria for inclusion, in base R but if
>>> not I can make one up for the sake of this discussion :) :
>>> * 1) the functionality should not already exist
>>> * 2) the function should be general enough
>>> * 3) the function should have a large amount of potential of users
>>> * 4) the function should be robust, and not require extensive maintenance
>>> * 5) the function should be stable, we shouldn't expect new features ever 2
>>> months
>>> * 6) the function should have an intuitive interface in the context of the
>>> rest ot base R
>>>
>>> I guess 1 and 6 could be held against my proposal, because :
>>> (1) everything can be done without pipes
>>> (6) They are somewhat surprising (though with explicit dots not that much,
>>> and not more surprising than say `bquote()`)
>>>
>>> In my opinion the + offset the -.
>>>
>>> I wouldn't advise taking magrittr's pipe (providing the license allows so)
>>> for instance, because it makes a lot of design choices and has a complex
>>> behavior, what I propose is 2 lines of code very unlikely to evolve or
>>> require maintenance.
>>>
>>> Antoine
>>>
>>> PS: I just receive the digest once a day so If you don't "reply all" I can
>>> only react later.
>>>
>>> Le sam. 5 oct. 2019 à 19:54, Hugh Marera <hugh.marera using gmail.com> a écrit :
>>>
>>>> I exaggerated the comparison for effect. However, it is not very
>>> difficult
>>>> to find functions in dplyr or data.table or indeed other packages that
>>> one
>>>> may wish to be in base R. Examples, for me, could include
>>>> data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
>>> Also,
>>>> the "popularity" of magrittr::`%>%` is mostly attributable to the
>>> tidyverse
>>>> (an advanced superset of R). Many R users don't even know that they are
>>>> installing the magrittr package.
>>>>
>>>> On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar <iucar using fedoraproject.org>
>>> wrote:
>>>>
>>>>> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera using gmail.com> wrote:
>>>>>>
>>>>>> How is your argument different to, say,  "Should dplyr or data.table
>>> be
>>>>>> part of base R as they are the most popular data science packages and
>>>>> they
>>>>>> are used by a large number of users?"
>>>>>
>>>>> Two packages with many features, dozens of functions and under heavy
>>>>> development to fix bugs, add new features and improve performance, vs.
>>>>> a single operator with a limited and well-defined functionality, and a
>>>>> reference implementation that hasn't changed in years (but certainly
>>>>> hackish in a way that probably could only be improved from R itself).
>>>>>
>>>>> Can't you really spot the difference?
>>>>>
>>>>> Iñaki
>>>>>
>>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list