[Rd] the pipe |> and line breaks in pipelines
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Wed Dec 9 21:51:01 CET 2020
Definitely support the idea that if this kind of trickery is going to
happen that it be confined to some particular IDE/environment or some
particular submission protocol. I don't want it to happen in my ESS
session please ... I'd rather deal with the parentheses.
On 12/9/20 3:45 PM, Timothy Goodman wrote:
> Regarding special treatment for |>, isn't it getting special treatment
> anyway, because it's implemented as a syntax transformation from x |> f(y)
> to f(x, y), rather than as an operator?
>
> That said, the point about wanting a block of code submitted line-by-line
> to work the same as a block of code submitted all at once is a fair one.
> Maybe the better solution would be if there were a way to say "Submit the
> selected code as a single expression, ignoring line-breaks". Then I could
> run any number of lines with pipes at the start and no special character at
> the end, and have it treated as a single pipeline. I suppose that'd need
> to be a feature offered by the environment (RStudio's RNotebooks in my
> case). I could wrap my pipelines in parentheses (to make the "pipes at
> start of line" syntax valid R code), and then could use the hypothetical
> "submit selected code ignoring line-breaks" feature when running just the
> first part of the pipeline -- i.e., selecting full lines, but starting
> after the opening paren so as not to need to insert a closing paren.
>
> - Tim
>
> On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <murdoch.duncan using gmail.com>
> wrote:
>
>> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
>>> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
>>> command in the Notebook environment I'm using) I certainly *would*
>>> expect R to treat it as a complete statement.
>>>
>>> But what I'm talking about is a different case, where I highlight a
>>> multi-line statement in my notebook:
>>>
>>> my_data_frame1
>>> |> filter(some_conditions_1)
>>>
>>> and then press Ctrl+Enter.
>>
>> I don't think I'd like it if parsing changed between passing one line at
>> a time and passing a block of lines. I'd like to be able to highlight a
>> few lines and pass those, then type one, then highlight some more and
>> pass those: and have it act as though I just passed the whole combined
>> block, or typed everything one line at a time.
>>
>>
>> Or, I suppose the equivalent would be to run
>>> an R script containing those two lines of code, or to run a multi-line
>>> statement like that from the console (which in RStudio I can do by
>>> pressing Shift+Enter between the lines.)
>>>
>>> In those cases, R could either (1) Give an error message [the current
>>> behavior], or (2) understand that the first line is meant to be piped to
>>> the second. The second option would be significantly more useful, and
>>> is almost certainly what the user intended.
>>>
>>> (For what it's worth, there are some languages, such as Javascript, that
>>> consider the first token of the next line when determining if the
>>> previous line was complete. JavaScript's rules around this are overly
>>> complicated, but a rule like "a pipe following a line break is treated
>>> as continuing the previous line" would be much simpler. And while it
>>> might be objectionable to treat the operator %>% different from other
>>> operators, the addition of |>, which isn't truly an operator at all,
>>> seems like the right time to consider it.)
>>
>> I think this would be hard to implement with R's current parser, but
>> possible. I think it could be done by distinguishing between EOL
>> markers within a block of text and "end of block" marks. If it applied
>> only to the |> operator it would be *really* ugly.
>>
>> My strongest objection to it is the one at the top, though. If I have a
>> block of lines sitting in my editor that I just finished executing, with
>> the cursor pointing at the next line, I'd like to know that it didn't
>> matter whether the lines were passed one at a time, as a block, or some
>> combination of those.
>>
>> Duncan Murdoch
>>
>>>
>>> -Tim
>>>
>>> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <murdoch.duncan using gmail.com
>>> <mailto:murdoch.duncan using gmail.com>> wrote:
>>>
>>> The requirement for operators at the end of the line comes from the
>>> interactive nature of R. If you type
>>>
>>> my_data_frame_1
>>>
>>> how could R know that you are not done, and are planning to type the
>>> rest of the expression
>>>
>>> %>% filter(some_conditions_1)
>>> ...
>>>
>>> before it should consider the expression complete? The way languages
>>> like C do this is by requiring a statement terminator at the end.
>> You
>>> can also do it by wrapping the entire thing in parentheses ().
>>>
>>> However, be careful: Don't use braces: they don't work. And parens
>>> have the side effect of removing invisibility from the result (which
>> is
>>> a design flaw or bonus, depending on your point of view). So I
>>> actually
>>> wouldn't advise this workaround.
>>>
>>> Duncan Murdoch
>>>
>>>
>>> On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
>>> > Hi,
>>> >
>>> > I'm a data scientist who routinely uses R in my day-to-day work,
>>> for tasks
>>> > such as cleaning and transforming data, exploratory data
>>> analysis, etc.
>>> > This includes frequent use of the pipe operator from the magrittr
>>> and dplyr
>>> > libraries, %>%. So, I was pleased to hear about the recent work
>> on a
>>> > native pipe operator, |>.
>>> >
>>> > This seems like a good time to bring up the main pain point I
>>> encounter
>>> > when using pipes in R, and some suggestions on what could be done
>>> about
>>> > it. The issue is that the pipe operator can't be placed at the
>>> start of a
>>> > line of code (except in parentheses). That's no different than
>>> any binary
>>> > operator in R, but I find it's a source of difficulty for the
>>> pipe because
>>> > of how pipes are often used.
>>> >
>>> > [I'm assuming here that my usage is fairly typical of a lot of
>>> users; at
>>> > any rate, I don't think I'm *too* unusual.]
>>> >
>>> > === Why this is a problem ===
>>> >
>>> > It's very common (for me, and I suspect for many users of dplyr)
>>> to write
>>> > multi-step pipelines and put each step on its own line for
>>> readability.
>>> > Something like this:
>>> >
>>> > ### Example 1 ###
>>> > my_data_frame_1 %>%
>>> > filter(some_conditions_1) %>%
>>> > inner_join(my_data_frame_2, by = some_columns_1) %>%
>>> > group_by(some_columns_2) %>%
>>> > summarize(some_aggregate_functions_1) %>%
>>> > filter(some_conditions_2) %>%
>>> > left_join(my_data_frame_3, by = some_columns_3) %>%
>>> > group_by(some_columns_4) %>%
>>> > summarize(some_aggregate_functions_2) %>%
>>> > arrange(some_columns_5)
>>> >
>>> > [I guess some might consider this an overly long pipeline; for me
>>> it's
>>> > pretty typical. I *could* split it up by assigning intermediate
>>> results to
>>> > variables, but much of the value I get from the pipe is that it
>>> lets my
>>> > code communicate which results are temporary, and which will be
>>> used again
>>> > later. Assigning variables for single-use results would remove
>> that
>>> > expressiveness.]
>>> >
>>> > I would prefer (for reasons I'll explain) to be able to write the
>>> above
>>> > example like this, which isn't valid R:
>>> >
>>> > ### Example 2 (not valid R) ###
>>> > my_data_frame_1
>>> > %>% filter(some_conditions_1)
>>> > %>% inner_join(my_data_frame_2, by = some_columns_1)
>>> > %>% group_by(some_columns_2)
>>> > %>% summarize(some_aggregate_functions_1)
>>> > %>% filter(some_conditions_2)
>>> > %>% left_join(my_data_frame_3, by = some_columns_3)
>>> > %>% group_by(some_columns_4)
>>> > %>% summarize(some_aggregate_functions_2)
>>> > %>% arrange(some_columns_5)
>>> >
>>> > One (minor) advantage is obvious: It lets you easily line up the
>>> pipes,
>>> > which means that you can see at a glance that the whole block is
>>> a single
>>> > pipeline, and you'd immediately notice if you inadvertently
>>> omitted a pipe,
>>> > which otherwise can lead to confusing output. [It's also
>>> aesthetically
>>> > pleasing, especially when %>% is replaced with |>, but that's
>>> subjective.]
>>> >
>>> > But the bigger issue happens when I want to re-run just *part* of
>> the
>>> > pipeline. I do this often when debugging: if the output of the
>>> pipeline
>>> > seems wrong, I re-run the first few steps and check the output,
>> then
>>> > include a little more and re-run again, etc., until I locate my
>>> mistake.
>>> > Working in an interactive notebook environment, this involves
>>> using the
>>> > cursor to select just the part of the code I want to re-run.
>>> >
>>> > It's fast and easy to select *entire* lines of code, but
>>> unfortunately with
>>> > the pipes placed at the end of the line I must instead select
>>> everything
>>> > *except* the last three characters of the line (the last two
>>> characters for
>>> > the new pipe). Then when I want to re-run the same partial
>>> pipeline with
>>> > the next line of code included, I can't just press SHIFT+Down to
>>> select it
>>> > as I otherwise would, but instead must move the cursor
>>> horizontally to a
>>> > position three characters before the end of *that* line (which is
>>> generally
>>> > different due to varying line lengths). And so forth each time I
>>> want to
>>> > include an additional line.
>>> >
>>> > Moreover, with the staggered positions of the pipes at the end of
>>> each
>>> > line, it's very easy to accidentally select the final pipe on a
>>> line, and
>>> > then sit there for a moment wondering if the environment has
>> stopped
>>> > responding before realizing it's just waiting for further input
>>> (i.e., for
>>> > the right-hand side). These small delays and disruptions add up
>>> over the
>>> > course of a day.
>>> >
>>> > This desire to select and re-run the first part of a pipeline is
>>> also the
>>> > reason why it doesn't suffice to achieve syntax like my "Example
>>> 2" by
>>> > wrapping the entire pipeline in parentheses. That's of no use if
>>> I want to
>>> > re-run a selection that doesn't include the final close-paren.
>>> >
>>> > === Possible Solutions ===
>>> >
>>> > I can think of two, but maybe there are others. The first would
>> make
>>> > "Example 2" into valid code, and the second would allow you to
>> run a
>>> > selection that included a trailing pipe.
>>> >
>>> > Solution 1: Add a special case to how R is parsed, so if the
>> first
>>> > (non-whitespace) token after an end-line is a pipe, that pipe
>>> gets moved to
>>> > before the end-line.
>>> > - Argument for: This lets you write code like example 2,
>> which
>>> > addresses the pain point around re-running part of a pipeline,
>>> and has
>>> > advantages for readability. Also, since starting a line with a
>> pipe
>>> > operator is currently invalid, the change wouldn't break any
>>> working code.
>>> > - Argument against: It would make the behavior of %>%
>>> inconsistent with
>>> > that of other binary operators in R. (However, this objection
>>> might not
>>> > apply to the new pipe, |>, which I understand is being
>>> implemented as a
>>> > syntax transformation rather than a binary operator.)
>>> >
>>> > Solution 2: Ignore the pipe operator if it occurs as the final
>>> token of
>>> > the code being executed.
>>> > - Argument for: This would mean the user could select and
>>> re-run the
>>> > first few lines of a longer pipeline (selecting *entire* lines),
>>> avoiding
>>> > the difficulties described above.
>>> > - Argument against: This means that %>% would be valid even
>>> if it
>>> > occurred without a right-hand side, which is inconsistent with
>> other
>>> > operators in R. (But, as above, this objection might not apply
>>> to |>.)
>>> > Also, this solution still doesn't enable the syntax of "Example
>>> 2", with
>>> > its readability benefit.
>>> >
>>> > Thanks for reading this and considering it.
>>> >
>>> > - Tim Goodman
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-devel using r-project.org <mailto:R-devel using r-project.org> mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>> >
>>>
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list