[Rd] sequential chained operator thoughts

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Mon Dec 7 20:13:29 CET 2020


It has been very enlightening watching the discussion not only about the
existing and proposed variations of a data "pipe" operator in R but also
cognates in many other languages.

So I am throwing out a QUESTION that just asks if the pipeline as done is
pretty much what could also be done without the need for an operator using a
sort of one-time brac]keted  construct where you call a function with a
sequence of operations you want performed and just have it handle the
in-between parts.

I mean something like:

return_val <- do_chain_sequence( { initial_data,
				function1(_VAL_);
				function2(_VAL_, more_args);
				function3(args, 2 * _VAL_, more_args);
				...
				function_n(_VAL_)
				})

The above is not meant to be taken literally. I don't care if the symbol is
_VAL_ or you use semi-colon characters between statements. There are many
possible variants such as each step being in its own curly braces. The idea
is to hand over one or more unevaluated blocks of code. There are such
functions in use in R already.

And yes, it can be written with explicit BEFORE/AFTER clauses to handle
things but those are implementation details and I want to focus on a
concept.

The point is you can potentially write a function that given such a series
of arguments, delays evaluation of them until each is needed or used. About
all it might need to do is set the value of something like _VAL_ from the
first argument if present and then take the text of each subsequent argument
and run it while saving the result back into _VAL_ and at the end, return
the last _VAL_. Along the way, of course, the temporary values stored each
time in _VAL_ would disappear.

Is something like this any improvement over this done by the user:

Initial <- whatever
Temp1 <- function1(initial)
Temp2 <- function2(Temp1, ...)
rm(Temp1)
...

Well, maybe not much. But it does hide some details and allows you to insert
or delete steps without worrying about pesky details like variable names
being in the right sequence or not over-riding other things in your
namespace. It makes your intent clear.

Now obviously being evaluated inside a function is not necessarily the same
as remaining in the original environment so having something like this as a
built-in running in place might be a better idea.

I admit the details of how to get one piece at a time as some unevaluated
form and recognize clearly what each piece is takes some careful thought. If
you want to automatically throw in a first argument of _VAL_ after the first
parenthesis found or inserted in new parens if just the name of a function
was presented,  or other such manipulations as already seem to happen with
the Magritrr pipe where a period is the placeholder, that can be delicate
work and also fail for some lines of code.  There may be many reasons
various versions of this proposal can fail for some cases. But functionally,
it would be a way to specify in a linear fashion that a sequence of steps is
to be connected with data being passed along as it changes.

I can also imagine how this kind of method might allow twists like asking
for _VAL_$second or other changes such as sorted(_VAL_) or minmax(_VAL_)
that would shrink the sequence.

This general idea looks like something that some programming language may
already do in some form and functionally and is a bit like the pipe idea,
albeit with different overhead.

And do note many languages already support this in subtle ways. R has a
variable called ".Last.value" that always holds the result of the last
statement evaluated. If the template above is used properly, that alone
might work, albeit be a bit wordy. But it may be more transient in some
cases such as a multi-part statement where it ends up being reset within the
statement.

I am NOT asking for a new feature in R, or any language. I am just asking if
the various pipeline ideas  used could be done in a general way like I
describe as a sequence where the statements are chained as described and
intermediate results are transient. But, yes, some implementations might
require some changes to the language to be implemented properly and it might
not satisfy people used to thinking a certain way.

I end by saying that R is a language that only returns one (sometimes
complex) return value. Other languages allow multiple return values and
pipelines there might be hard to implement or have weird features that allow
various of the returns to be captured or even a more general graph of
command sequences  rather than just a linear pipeline. My thoughts here are
for R alone. And I shudder at what happens if you allow exceptions and other
kinds of breaks/returns out of such a sequential grouping in mid-stride. I
view most such additions and changes as needing careful thought to make sure
they have the functionality most people want, are as consistent as possible
with the existing language, and do not have lots of potential for bugs or
bad programs or ones very hard to read and understand.




Scanned by McAfee and confirmed virus-free.	
Find out more here: https://bit.ly/2zCJMrO



More information about the R-devel mailing list