[Rd] should base R have a piping operator ?

Joris Meys jor|@mey@ @end|ng |rom gm@||@com
Sun Oct 6 10:13:53 CEST 2019


I'm largely with Gabriel Becker on this one: if pipes enter base R, they
should be a well thought out and integrated part of the language.

I do see merit though in providing a pipe in base R. Reason is mainly that
right now there's not a single pipe. A pipe function exists in different
packages, and it's not impossible that at one point piping operators might
behave slightly different depending on the package you load. So I hope
someone from RStudio is reading this thread and decides to do the heavy
lifting for R core. After all, it really is mainly their packages that
would benefit from it. I can't think of a non-tidyverse package that's
easier to use with pipes than without.

Best
Joris

On Sun, Oct 6, 2019 at 1:50 AM Gabriel Becker <gabembecker using gmail.com> wrote:

> Hi all,
>
> I think there's some nuance here that makes makes me agree partially with
> each "side".
>
> The pipe is inarguably extremely popular. Many probably think of it as a
> core feature of R, along with the tidyverse that (as was pointed out)
> largely surrounds it and drives its popularity. Whether its a good or bad
> thing that they think that doesn't change the fact that by my estimation
> that Ant is correct that they do. BUT, I don't agree with him that that, by
> itself, is a reason to put it in base R in the form that it exists now. For
> the current form, there aren't really any major downsides that I see to
> having people just use the package version.
>
> Sure it may be a little weird, but it doesn't ever really stop the
> people from using it or present a significant barrier. Another major point
> is that many (most?) base R functions are not necessarily tooled to be
> endomorphic, which in my personal opinion is *largely* the only place that
> the pipes are really compelling.
>
> That was for pipes as the exist in package space, though. There is another
> way the pipe could go into base R that could not be done in package space
> and has the potential to mitigate some pretty serious downsides to the
> pipes relating to debugging, which would be to implement them in the
> parser.
>
> If
>
> iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
> filter(mean_sl > 5)
>
>
> were *parsed* as, for example, into
>
> local({
>             . = group_by(iris, Species)
>
>             ._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
>
>             filter(., mean_sl > 5)
>        })
>
>
>
>
> Then debuggiing (once you knew that) would be much easier but behavaior
> would be the same as it is now. There could even be some sort of
> step-through-pipe debugger at that point added as well for additional
> convenience.
>
> There is some minor precedent for that type of transformative parsing:
>
> > expr = parse(text = "5 -> x")
>
> > expr
>
> expression(5 -> x)
>
> > expr[[1]]
>
> x <- 5
>
>
> Though thats a much more minor transformation.
>
> All of that said, I believe Jim Hester (cc'ed) suggested something along
> these lines at the RSummit a couple of years ago, and thus far R-core has
> not shown much appetite for changing things in the parser.
>
> Without that changing, I'd have to say that my vote, for whatever its
> worth, comes down on the side of pipes being fine in packages. A summary of
> my reasoning being that it only makes sense for them to go into R itself if
> doing so fixes an issue that cna't be fixed with them in package space.
>
> Best,
> ~G
>
>
>
> On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri using gmail.com> wrote:
>
> > Yes but this exageration precisely misses the point.
> >
> > Concerning your examples:
> >
> > * I love fread but I think it makes a lot of subjective choices that are
> > best associated with a package. I think it
> > changed a lot with time and can still change, and we have great
> developers
> > willing to maintain it and be reactive
> > regarding feature requests or bug reports
> >
> > *.group_by() adds a class that works only (or mostly) with tidyverse
> verbs,
> > that's very easy to dismiss it as an inclusion in base R.
> >
> > * summarize is an alternative to aggregate, that would be very confusing
> to
> > have both
> >
> > Now to be fair to your argument we could think of other functions such as
> > data.table::rleid() which I believe base R misses deeply,
> > and there is nothing wrong with packaged functions making their way to
> base
> > R.
> >
> > Maybe there's an existing list of criteria for inclusion, in base R but
> if
> > not I can make one up for the sake of this discussion :) :
> > * 1) the functionality should not already exist
> > * 2) the function should be general enough
> > * 3) the function should have a large amount of potential of users
> > * 4) the function should be robust, and not require extensive maintenance
> > * 5) the function should be stable, we shouldn't expect new features
> ever 2
> > months
> > * 6) the function should have an intuitive interface in the context of
> the
> > rest ot base R
> >
> > I guess 1 and 6 could be held against my proposal, because :
> > (1) everything can be done without pipes
> > (6) They are somewhat surprising (though with explicit dots not that
> much,
> > and not more surprising than say `bquote()`)
> >
> > In my opinion the + offset the -.
> >
> > I wouldn't advise taking magrittr's pipe (providing the license allows
> so)
> > for instance, because it makes a lot of design choices and has a complex
> > behavior, what I propose is 2 lines of code very unlikely to evolve or
> > require maintenance.
> >
> > Antoine
> >
> > PS: I just receive the digest once a day so If you don't "reply all" I
> can
> > only react later.
> >
> > Le sam. 5 oct. 2019 à 19:54, Hugh Marera <hugh.marera using gmail.com> a
> écrit :
> >
> > > I exaggerated the comparison for effect. However, it is not very
> > difficult
> > > to find functions in dplyr or data.table or indeed other packages that
> > one
> > > may wish to be in base R. Examples, for me, could include
> > > data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
> > Also,
> > > the "popularity" of magrittr::`%>%` is mostly attributable to the
> > tidyverse
> > > (an advanced superset of R). Many R users don't even know that they are
> > > installing the magrittr package.
> > >
> > > On Sat, Oct 5, 2019 at 6:30 PM Iñaki Ucar <iucar using fedoraproject.org>
> > wrote:
> > >
> > >> On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera using gmail.com>
> wrote:
> > >> >
> > >> > How is your argument different to, say,  "Should dplyr or data.table
> > be
> > >> > part of base R as they are the most popular data science packages
> and
> > >> they
> > >> > are used by a large number of users?"
> > >>
> > >> Two packages with many features, dozens of functions and under heavy
> > >> development to fix bugs, add new features and improve performance, vs.
> > >> a single operator with a limited and well-defined functionality, and a
> > >> reference implementation that hasn't changed in years (but certainly
> > >> hackish in a way that probably could only be improved from R itself).
> > >>
> > >> Can't you really spot the difference?
> > >>
> > >> Iñaki
> > >>
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2018-2019
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list