[R] Pipe operator

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Tue Jan 3 18:40:44 CET 2023


John,

The topic has indeed been discussed here endlessly but new people still
stumble upon it.

Until recently, the formal R language did not have a built-in pipe
functionality. It was widely used through an assortment of packages and
there are quite a few variations on the theme including different
implementations.

Most existing code does use the operator %>% but there is now a built-in |>
operator that is generally faster but is not as easy to use in a few cases.

Please forget the use of the word FILE here. Pipes are a form of syntactic
sugar that generally is about the FIRST argument to a function. They are NOT
meant to be used just for the trivial case you mention where indeed there is
an easy way to do things. Yes, they work in such situations. But consider a
deeply nested expression like this:

Result <- round(max(cos(x), 3.14159/4), 3)

There are MANY deeper nested expressions like this commonly used. The above
can be written linearly as in

Temp1 <- cos(x)
Temp2 <- max(Temp1, 3.14159/4)
Result <- round(Temp2, 3)

Translation, for some variable x, calculate the cosine and take the maximum
value of it as compared to pi/4 and round the result to three decimal
places. Not an uncommon kind of thing to do and sometimes you can nest such
things many layers deep and get hopelessly confused if not done somewhat
linearly.

What pipes allow is to write this closer to the second way while not seeing
or keeping any temporary variables around. The goal is to replace the FIRST
argument to a function with whatever resulted as the value of the previous
expression. That is often a vector or data.frame or list or any kind of
object but can also be fairly complex as in a list of lists of matrices.

So you can still start with cos(x) OR you can write this where the x is
removed from within and leaves cos() empty:

x %>% cos
or
x |> cos()

In the previous version of pipes the parentheses after cos() are optional if
there are no additional arguments but the new pipe requires them.

So continuing the above, using multiple lines, the pipe looks like:

Result <-
  x %>%
  cos() %>%
  max(3.14159/4) %>%
  round(3)

This gives the same result but is arguably easier for some to read and
follow. Nobody forces you to use it and for simple cases, most people don't.

There is a grouping of packages called the tidyverse that makes heavy use of
pipes routine as they made most or all their functions such that the first
argument is the one normally piped to and it can be very handy to write code
that says, read in your data into a variable (a data.frame or tibble often)
and PIPE IT to a function that renames some columns and PIPE the resulting
modified object to a function that retains only selected rows and pipe that
to a function that drops some of the columns and pipe that to a function
that groups the items or sorts them and pipe that to a function that does a
join with another object or generates a report or so many other things.

So the real answer is that piping is another WAY of doing things from a
programmers perspective. Underneath it all, it is mostly syntactic sugar and
the interpreter rearranges your code and performs the steps in what seems
like a different order at times. Generally, you do not need to care.



-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Sorkin, John
Sent: Tuesday, January 3, 2023 11:49 AM
To: 'R-help Mailing List' <r-help using r-project.org>
Subject: [R] Pipe operator

I am trying to understand the reason for existence of the pipe operator,
%>%, and when one should use it. It is my understanding that the operator
sends the file to the left of the operator to the function immediately to
the right of the operator:

c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the
result one obtains using the mean function directly, viz. mean(c(1:10)).
What is the reason for having two syntactically different but semantically
identical ways to call a function? Is one more efficient than the other?
Does one use less memory than the other? 

P.S. Please forgive what might seem to be a question with an obvious answer.
I am a programmer dinosaur. I have been programming for more than 50 years.
When I started programming in the 1960s the only pipe one spoke about was a
bong.  

John

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list