[Rd] the pipe |> and line breaks in pipelines

Wed Dec 9 22:35:54 CET 2020

Many languages allow a final backslash (“\”) character to allow an
expression to span multiple lines, and I’ve often wished for this in R,
particularly to allow me to put  `else` on a separate line at the
top-level. It would also allow alignment of infix operators like the new
pipe operator `|>` at the start of a line, which I would heartily endorse.

On Wed, Dec 9, 2020 at 3:58 PM Ben Bolker <bbolker using gmail.com> wrote:

>    Definitely support the idea that if this kind of trickery is going to
> happen that it be confined to some particular IDE/environment or some
> particular submission protocol. I don't want it to happen in my ESS
> session please ... I'd rather deal with the parentheses.
>
> On 12/9/20 3:45 PM, Timothy Goodman wrote:
> > Regarding special treatment for |>, isn't it getting special treatment
> > anyway, because it's implemented as a syntax transformation from x |>
> f(y)
> > to f(x, y), rather than as an operator?
> >
> > That said, the point about wanting a block of code submitted line-by-line
> > to work the same as a block of code submittedr d all at once is a fair
> one.
> > Maybe the better solution would be if there were a way to say "Submit the
> > selected code as a single expression, ignoring line-breaks".  Then I
> could
> > run any number of lines with pipes at the start and no special character
> at
> > the end, and have it treated as a single pipeline.  I suppose that'd need
> > to be a feature offered by the erred environment (RStudio's RNotebooks
> in my
> > case).  I could wrap my pipelines in parentheses (to make the "pipes at
> > start of line" syntax valid R code), and then could use the hypothetical
> > "submit selected code ignoring line-breaks" feature when running just the
> > first part of the pipeline -- i.e., selecting full lines, but starting
> > after the opening paren so as not to need to insert a closing paren.
> >
> > - Tim
> >
> > On Wed, Dec 9, 2020 at 12:12 PM Duncan Murdoch <murdoch.duncan using gmail.com
> >
> > wrote:
> >
> >> On 09/12/2020 2:33 p.m., Timothy Goodman wrote:
> >>> If I type my_data_frame_1 and press Enter (or Ctrl+Enter to execute the
> >>> command in the Notebook environment I'm using) I certainly *would*
> >>> expect R to treat it as a complete statement.
> >>>
> >>> But what I'm talking about is a different case, where I highlight a
> >>> multi-line statement in my notebook:
> >>>
> >>>       my_data_frame1
> >>>           |> filter(some_conditions_1)
> >>>
> >>> and then press Ctrl+Enter.
> >>
> >> I don't think I'd like it if parsing changed between passing one line at
> >> a time and passing a block of lines.  I'd like to be able to highlight a
> >> few lines and pass those, then type one, then highlight some more and
> >> pass those:  and have it act as though I just passed the whole combined
> >> block, or typed everything one line at a time.
> >>
> >>
> >>     Or, I suppose the equivalent would be to run
> >>> an R script containing those two lines of code, or to run a multi-line
> >>> statement like that from the console (which in RStudio I can do by
> >>> pressing Shift+Enter between the lines.)
> >>>
> >>> In those cases, R could either (1) Give an error message [the current
> >>> behavior], or (2) understand that the first line is meant to be piped
> to
> >>> the second.  The second option would be significantly more useful, and
> >>> is almost certainly what the user intended.
> >>>
> >>> (For what it's worth, there are some languages, such as Javascript,
> that
> >>> consider the first token of the next line when determining if the
> >>> previous line was complete.  JavaScript's rules around this are overly
> >>> complicated, but a rule like "a pipe following a line break is treated
> >>> as continuing the previous line" would be much simpler.  And while it
> >>> might be objectionable to treat the operator %>% different from other
> >>> operators, the addition of |>, which isn't truly an operator at all,
> >>> seems like the right time to consider it.)
> >>
> >> I think this would be hard to implement with R's current parser, but
> >> possible.  I think it could be done by distinguishing between EOL
> >> markers within a block of text and "end of block" marks.  If it applied
> >> only to the |> operator it would be *really* ugly.
> >>
> >> My strongest objection to it is the one at the top, though.  If I have a
> >> block of lines sitting in my editor that I just finished executing, with
> >> the cursor pointing at the next line, I'd like to know that it didn't
> >> matter whether the lines were passed one at a time, as a block, or some
> >> combination of those.
> >>
> >> Duncan Murdoch
> >>
> >>>
> >>> -Tim
> >>>
> >>> On Wed, Dec 9, 2020 at 3:12 AM Duncan Murdoch <
> murdoch.duncan using gmail.com
> >>> <mailto:murdoch.duncan using gmail.com>> wrote:
> >>>
> >>>      The requirement for operators at the end of the line comes from
> the
> >>>      interactive nature of R.  If you type
> >>>
> >>>            my_data_frame_1
> >>>
> >>>      how could R know that you are not done, and are planning to type
> the
> >>>      rest of the expression
> >>>
> >>>              %>% filter(some_conditions_1)
> >>>              ...
> >>>
> >>>      before it should consider the expression complete?  The way
> languages
> >>>      like C do this is by requiring a statement terminator at the end.
> >> You
> >>>      can also do it by wrapping the entire thing in parentheses ().
> >>>
> >>>      However, be careful: Don't use braces:  they don't work.  And
> parens
> >>>      have the side effect of removing invisibility from the result
> (which
> >> is
> >>>      a design flaw or bonus, depending on your point of view).  So I
> >>>      actually
> >>>      wouldn't advise this workaround.
> >>>
> >>>      Duncan Murdoch
> >>>
> >>>
> >>>      On 09/12/2020 12:45 a.m., Timothy Goodman wrote:
> >>>       > Hi,
> >>>       >
> >>>       > I'm a data scientist who routinely uses R in my day-to-day
> work,
> >>>      for tasks
> >>>       > such as cleaning and transforming data, exploratory data
> >>>      analysis, etc.
> >>>       > This includes frequent use of the pipe operator from the
> magrittr
> >>>      and dplyr
> >>>       > libraries, %>%.  So, I was pleased to hear about the recent
> work
> >> on a
> >>>       > native pipe operator, |>.
> >>>       >
> >>>       > This seems like a good time to bring up the main pain point I
> >>>      encounter
> >>>       > when using pipes in R, and some suggestions on what could be
> done
> >>>      about
> >>>       > it.  The issue is that the pipe operator can't be placed at the
> >>>      start of a
> >>>       > line of code (except in parentheses).  That's no different than
> >>>      any binary
> >>>       > operator in R, but I find it's a source of difficulty for the
> >>>      pipe because
> >>>       > of how pipes are often used.
> >>>       >
> >>>       > [I'm assuming here that my usage is fairly typical of a lot of
> >>>      users; at
> >>>       > any rate, I don't think I'm *too* unusual.]
> >>>       >
> >>>       > === Why this is a problem ===
> >>>       >
> >>>       > It's very common (for me, and I suspect for many users of
> dplyr)
> >>>      to write
> >>>       > multi-step pipelines and put each step on its own line for
> >>>      readability.
> >>>       > Something like this:
> >>>       >
> >>>       >    ### Example 1 ###
> >>>       >    my_data_frame_1 %>%
> >>>       >      filter(some_conditions_1) %>%
> >>>       >      inner_join(my_data_frame_2, by = some_columns_1) %>%
> >>>       >      group_by(some_columns_2) %>%
> >>>       >      summarize(some_aggregate_functions_1) %>%
> >>>       >      filter(some_conditions_2) %>%
> >>>       >      left_join(my_data_frame_3, by = some_columns_3) %>%
> >>>       >      group_by(some_columns_4) %>%
> >>>       >      summarize(some_aggregate_functions_2) %>%
> >>>       >      arrange(some_columns_5)
> >>>       >
> >>>       > [I guess some might consider this an overly long pipeline; for
> me
> >>>      it's
> >>>       > pretty typical.  I *could* split it up by assigning
> intermediate
> >>>      results to
> >>>       > variables, but much of the value I get from the pipe is that it
> >>>      lets my
> >>>       > code communicate which results are temporary, and which will be
> >>>      used again
> >>>       > later.  Assigning variables for single-use results would remove
> >> that
> >>>       > expressiveness.]
> >>>       >
> >>>       > I would prefer (for reasons I'll explain) to be able to write
> the
> >>>      above
> >>>       > example like this, which isn't valid R:
> >>>       >
> >>>       >    ### Example 2 (not valid R) ###
> >>>       >    my_data_frame_1
> >>>       >      %>% filter(some_conditions_1)
> >>>       >      %>% inner_join(my_data_frame_2, by = some_columns_1)
> >>>       >      %>% group_by(some_columns_2)
> >>>       >      %>% summarize(some_aggregate_functions_1)
> >>>       >      %>% filter(some_conditions_2)
> >>>       >      %>% left_join(my_data_frame_3, by = some_columns_3)
> >>>       >      %>% group_by(some_columns_4)
> >>>       >      %>% summarize(some_aggregate_functions_2)
> >>>       >      %>% arrange(some_columns_5)
> >>>       >
> >>>       > One (minor) advantage is obvious: It lets you easily line up
> the
> >>>      pipes,
> >>>       > which means that you can see at a glance that the whole block
> is
> >>>      a single
> >>>       > pipeline, and you'd immediately notice if you inadvertently
> >>>      omitted a pipe,
> >>>       > which otherwise can lead to confusing output.  [It's also
> >>>      aesthetically
> >>>       > pleasing, especially when %>% is replaced with |>, but that's
> >>>      subjective.]
> >>>       >
> >>>       > But the bigger issue happens when I want to re-run just *part*
> of
> >> the
> >>>       > pipeline.  I do this often when debugging: if the output of the
> >>>      pipeline
> >>>       > seems wrong, I re-run the first few steps and check the output,
> >> then
> >>>       > include a little more and re-run again, etc., until I locate my
> >>>      mistake.
> >>>       > Working in an interactive notebook environment, this involves
> >>>      using the
> >>>       > cursor to select just the part of the code I want to re-run.
> >>>       >
> >>>       > It's fast and easy to select *entire* lines of code, but
> >>>      unfortunately with
> >>>       > the pipes placed at the end of the line I must instead select
> >>>      everything
> >>>       > *except* the last three characters of the line (the last two
> >>>      characters for
> >>>       > the new pipe).  Then when I want to re-run the same partial
> >>>      pipeline with
> >>>       > the next line of code included, I can't just press SHIFT+Down
> to
> >>>      select it
> >>>       > as I otherwise would, but instead must move the cursor
> >>>      horizontally to a
> >>>       > position three characters before the end of *that* line (which
> is
> >>>      generally
> >>>       > different due to varying line lengths).  And so forth each
> time I
> >>>      want to
> >>>       > include an additional line.
> >>>       >
> >>>       > Moreover, with the staggered positions of the pipes at the end
> of
> >>>      each
> >>>       > line, it's very easy to accidentally select the final pipe on a
> >>>      line, and
> >>>       > then sit there for a moment wondering if the environment has
> >> stopped
> >>>       > responding before realizing it's just waiting for further input
> >>>      (i.e., for
> >>>       > the right-hand side).  These small delays and disruptions add
> up
> >>>      over the
> >>>       > course of a day.
> >>>       >
> >>>       > This desire to select and re-run the first part of a pipeline
> is
> >>>      also the
> >>>       > reason why it doesn't suffice to achieve syntax like my
> "Example
> >>>      2" by
> >>>       > wrapping the entire pipeline in parentheses.  That's of no use
> if
> >>>      I want to
> >>>       > re-run a selection that doesn't include the final close-paren.
> >>>       >
> >>>       > === Possible Solutions ===
> >>>       >
> >>>       > I can think of two, but maybe there are others.  The first
> would
> >> make
> >>>       > "Example 2" into valid code, and the second would allow you to
> >> run a
> >>>       > selection that included a trailing pipe.
> >>>       >
> >>>       >    Solution 1: Add a special case to how R is parsed, so if the
> >> first
> >>>       > (non-whitespace) token after an end-line is a pipe, that pipe
> >>>      gets moved to
> >>>       > before the end-line.
> >>>       >      - Argument for: This lets you write code like example 2,
> >> which
> >>>       > addresses the pain point around re-running part of a pipeline,
> >>>      and has
> >>>       > advantages for readability.  Also, since starting a line with a
> >> pipe
> >>>       > operator is currently invalid, the change wouldn't break any
> >>>      working code.
> >>>       >      - Argument against: It would make the behavior of %>%
> >>>      inconsistent with
> >>>       > that of other binary operators in R.  (However, this objection
> >>>      might not
> >>>       > apply to the new pipe, |>, which I understand is being
> >>>      implemented as a
> >>>       > syntax transformation rather than a binary operator.)
> >>>       >
> >>>       >    Solution 2: Ignore the pipe operator if it occurs as the
> final
> >>>      token of
> >>>       > the code being executed.
> >>>       >      - Argument for: This would mean the user could select and
> >>>      re-run the
> >>>       > first few lines of a longer pipeline (selecting *entire*
> lines),
> >>>      avoiding
> >>>       > the difficulties described above.
> >>>       >      - Argument against: This means that %>% would be valid
> even
> >>>      if it
> >>>       > occurred without a right-hand side, which is inconsistent with
> >> other
> >>>       > operators in R.  (But, as above, this objection might not apply
> >>>      to |>.)
> >>>       > Also, this solution still doesn't enable the syntax of "Example
> >>>      2", with
> >>>       > its readability benefit.
> >>>       >
> >>>       > Thanks for reading this and considering it.
> >>>       >
> >>>       > - Tim Goodman
> >>>       >
> >>>       >       [[alternative HTML version deleted]]
> >>>       >
> >>>       > ______________________________________________
> >>>       > R-devel using r-project.org <mailto:R-devel using r-project.org> mailing
> list
> >>>       > https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>      <https://stat.ethz.ch/mailman/listinfo/r-devel>
> >>>       >
> >>>
> >>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
"Whereas true religion and good morals are the only solid foundations of
public liberty and happiness . . . it is hereby earnestly recommended to
the several States to take the most effectual measures for the
encouragement thereof." Continental Congress, 1778

	[[alternative HTML version deleted]]