[R] Pipe operator
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Tue Jan 3 20:22:36 CET 2023
Às 19:14 de 03/01/2023, Rui Barradas escreveu:
> Às 17:35 de 03/01/2023, Greg Snow escreveu:
>> To expand a little on Christopher's answer.
>>
>> The short answer is that having the different syntaxes can lead to
>> more readable code (when used properly).
>>
>> Note that there are now 2 different (but somewhat similar) pipes
>> available in R (there could be more in some package(s) that I don't
>> know about, but will just talk about the main 2).
>>
>> The %>% pipe comes from the magrittr package, but many other packages
>> now import that package. But you need to load the magrittr package,
>> either directly or indirectly, before you can use that pipe. The
>> magrittr pipe is a function call, so there is small increase in time
>> and memory for using it, but it is a small fraction of a second and a
>> few bytes of memory, so you probably will not notice the increased
>> usage.
>>
>> The core R language now has a built in pipe |> which is handled by the
>> parser, so no extra function calls and you do not need to load any
>> extra packages (though you need a somewhat recent version of R, within
>> the last year or so).
>>
>> The built-in |> pipe is a little pickier, you need to include the
>> parentheses in a function call, e.g. 1:10 |> mean() where the magrittr
>> pipe can work with that call or the function without parentheses, e.g.
>> 1:10 %>% mean or 1:10 %>% mean(), this makes %>% a little easier to
>> work with anonymous functions. If the previous return needs to be
>> passed to an argument other than the first, then %>% uses "." and |>
>> uses "_".
>>
>> The magrittr package has additional versions of the pipe and some
>> functions that wrap around common operators to make it easier to use
>> them with pipes, so there are still advantages to loading that package
>> if any of those are helpful.
>>
>> For a simple case like your example, the pipe probably does not help
>> with readability much, but as we string more function calls together.
>> For example, here are 3 ways to compute the geometric mean of the data
>> in a vector "x":
>>
>> exp(mean(log(x)))
>>
>> logx <- log(x)
>> mlx <- mean(logx)
>> exp(mtx)
>>
>> x |>
>> log() |>
>> mean() |>
>> exp()
>>
>> These all do the same thing, but the first option is read from the
>> middle outward (which can be tricky) and is even more complicated if
>> you use additional arguments to any of the functions.
>> The second option reads top down, but requires creating intermediate
>> variables. The last reads similar to the second, but without the
>> extra variables. Spreading the series of function calls across
>> multiple rows makes it easier to read and easily lets you insert a
>> line like `print() |>` for debugging or checking intermediate results,
>> and single lines can easily be commented out to skip that step.
>>
>> I have found myself using code like the following to compute a table,
>> print it, and compute the proportions all in one step:
>>
>> table(f, g) |>
>> print() |>
>> prop.table()
>>
>> The pipes also work very well with the tidyverse, or even the tidy
>> data ideas without those packages where we use a single function for
>> each change, e.g. start with a data frame, select a subset of the
>> columns, filter to a subset of the rows, mutate a column, join to
>> another data frame, then pass the final result to a modeling function
>> like `lm` (and then pass that result to a summary function). This is
>> nicely readable when each step is its own line.
>>
>> On Tue, Jan 3, 2023 at 9:49 AM Sorkin, John
>> <jsorkin using som.umaryland.edu> wrote:
>>>
>>> I am trying to understand the reason for existence of the pipe
>>> operator, %>%, and when one should use it. It is my understanding
>>> that the operator sends the file to the left of the operator to the
>>> function immediately to the right of the operator:
>>>
>>> c(1:10) %>% mean results in a value of 5.5 which is exactly the same
>>> as the result one obtains using the mean function directly, viz.
>>> mean(c(1:10)). What is the reason for having two syntactically
>>> different but semantically identical ways to call a function? Is one
>>> more efficient than the other? Does one use less memory than the other?
>>>
>>> P.S. Please forgive what might seem to be a question with an obvious
>>> answer. I am a programmer dinosaur. I have been programming for more
>>> than 50 years. When I started programming in the 1960s the only pipe
>>> one spoke about was a bong.
>>>
>>> John
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> Hello,
>
> Not a long time ago, there was (very) relevant post to r-devel [1] by
> Paul Murrell linking to a YouTube video [2].
>
> [1] https://stat.ethz.ch/pipermail/r-devel/2022-September/081959.html
> [2] https://youtu.be/IMpXB30MP48
>
> Hope this helps,
>
> Rui Barradas
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
Sorry, I forgot the link to the beginning of that r-devel thread.
https://stat.ethz.ch/pipermail/r-devel/2022-April/081636.html
Rui Barradas
More information about the R-help
mailing list