[Rd] Multiple Assignment built into the R Interpreter?

Tue Mar 14 23:57:38 CET 2023

On 13/03/2023 6:01 a.m., Duncan Murdoch wrote:
> Yes, this is really a problem with the checks, not with the language.
> 
> A simpler approach than your alternativeAssignment function would be
> simply to allow globalVariables() to be limited to a single function as
> the note in its help page says.

I just took a look, and this would be quite easy to do.  It would 
require changes to codetools and to utils, but probably just a few dozen 
lines.

Duncan Murdoch

> 
> This might be tedious to write by hand, but could be automated using
> methods like "dotify" in dotty.
> 
> Duncan Murdoch
> 
> 
> On 12/03/2023 10:36 p.m., Pavel Krivitsky wrote:
>> Dear All,
>>
>> As a maintainer of large, complex packages, I can think of many places
>> in which deconstructing assignment would simplify the code, as well as
>> facilitate readability by breaking up larger functions into helpers, so
>> I would be very glad to see this incorporated somehow.
>>
>> I think the crux of the matter is that while there is a number of ways
>> to implement deconstructing assignment within R, there is no mechanism
>> to tell R CMD check about it without also suppressing checks for every
>> other instance of that variable name. This is particularly problematic
>> because those variable names are likely to be used elsewhere in the
>> package.
>>
>> Workarounds that have been suggested all defeat the conciseness and
>> clarity of the deconstructing assignment and introduce potential for
>> subtle bugs.
>>
>> The check warnings are something that can only be addressed in
>> 'codetools', with a finer API than what utils::globalVariables()
>> provides. Perhaps this would have a lower hurdle than modifying R
>> language itself?
>>
>>   From skimming through the relevant 'codetools' code, one idea for such
>> an API would be a function, along the lines of
>>
>> utils::alternativeAssignment(op, assigned)
>>
>> that sets up a callback assigned = function(op, e) that given the
>> operator (as string) and the expression it's embedded in, returns a
>> list of three elements:
>>    * a character vector containing a list of variables assigned to that
>>      might not otherwise be detected
>>    * a character vector containing a list of variables referenced that
>>      might not otherwise be detected
>>    * expression e with potentially "offending" elements removed, which
>>      will then be processed by the rest of the checking code
>>
>> Then, say, 'zeallot' could implement zeallot::zeallot_assign_detect(),
>> and a package developer using it could put
>>
>> utils::alternativeAssignment("%<-%", zeallot::zeallot_assign_detect)
>>
>> in their .onLoad() function. Similarly, users of 'dotty' could set up
>> callbacks for all standard assignment operators to inform the code
>> about the nonstandard assignment.
>>
>> Best Regards,Pavel
>>
>> On Sun, 2023-03-12 at 14:05 +0200, Sebastian Martin Krantz wrote:
>>> Kevins package is very nice as a proof of concept, no doubt about
>>> that, but
>>> it is not at the level of performance or convenience that a native R
>>> implementation would offer. I would probably not use it to translate
>>> matlab
>>> routines into R packages placed on CRAN, because it’s an additional
>>> dependency, I have a performance burden in every iteration, and
>>> utils::globalVariables() is everything but elegant. From that
>>> perspective
>>> it would be more convenient for me right now to stick with
>>> collapse::%=%,
>>> which is already written in C, and also call
>>> utils::globalVariables().
>>>
>>> But again my hope in starting this was that R Core might see that the
>>> addition of multiple assignment would be a significant enhancement to
>>> the
>>> language, of the same order as the base pipe |> in my opinion.
>>>
>>> I think the discussion so far has at least brought forth a way to
>>> implement
>>> this in a way that does not violate fundamental principles of the
>>> language.
>>> Which could form a basis for thinking about an actual addition to the
>>> language.
>>>
>>> Best regards,
>>>
>>> Sebastian
>>>
>>>
>>> On Sun 12. Mar 2023 at 13:18, Duncan Murdoch
>>> <murdoch.duncan using gmail.com>
>>> wrote:
>>>
>>>> On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:
>>>>> Thinking more about this, and seeing Kevins examples at
>>>>> https://github.com/kevinushey/dotty
>>>>> <https://github.com/kevinushey/dotty>, I think this is the most
>>>>> R-like
>>>>> way of doing it,
>>>>> with an additional benefit as it would allow to introduce the
>>>>> useful
>>>>> data.table semantics DT[, .(a = b, c, d)] to more general R. So I
>>>>> would
>>>>> propose to
>>>>> introduce a new primitive function . <- function(...)
>>>>> .Primitive(".") in
>>>>> R with an assignment method and the following features:
>>>>
>>>> I think that proposal is very unlikely to be accepted.  If it was a
>>>> primitive function, it could only be maintained by R Core.  They
>>>> are
>>>> justifiably very reluctant to take on extra work for themselves.
>>>>
>>>> Kevin's package demonstrates that this can be done entirely in a
>>>> contributed package, which means there's no need for R Core to be
>>>> involved.  I don't know if he has plans to turn his prototype into
>>>> a
>>>> CRAN package.  If he doesn't, then it will be up to some other
>>>> interested maintainer to step up and take on the task, or it will
>>>> just
>>>> fade away.
>>>>
>>>> I haven't checked whether your proposals below represent changes
>>>> from
>>>> the current version of dotty, but if they do, the way to proceed is
>>>> to
>>>> fork that project, implement your changes, and offer to contribute
>>>> them
>>>> back to the main branch.
>>>>
>>>> Duncan Murdoch
>>>>
>>>>
>>>>
>>>>>
>>>>>     * Positional assignment e.g. .[nr, nc] <- dim(x), and named
>>>>> assignment
>>>>>       e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars.
>>>>> All
>>>>>       the functionality proposed by Kevin at
>>>>>       https://github.com/kevinushey/dotty
>>>>>       <https://github.com/kevinushey/dotty> is useful, unambiguous
>>>>> and
>>>>>       feasible.
>>>>>     * Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <-
>>>>> mtcars.
>>>>>     * Mixing of positional and named assignment e.g .[mpg_new,
>>>>> carb_new =
>>>>>       carb, cyl_new] <- mtcars. The inputs not assigned by name are
>>>>> simply
>>>>>       the elements of RHS in the order they occur, regardless of
>>>>> whether
>>>>>       they have been used previously e.g. .[mpg_new, cyl_new = cyl,
>>>>>       log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here
>>>>> could
>>>>>       be any named vector type.
>>>>>     * Conventional use of the function as lazy version of of
>>>>> list(), as in
>>>>>       data.table: .(A = B, C, D) is the same as list(A = B, C = C,
>>>>> D = D).
>>>>>       This would also be useful, allowing more parsimonious code,
>>>>> and
>>>>>       avoid the need to assign names to all return values in a
>>>>> function
>>>>>       return, e.g. if I already have matrices A, C, Q and R as
>>>>> internal
>>>>>       objects in my function, I can simply end by return(.(A, C, Q,
>>>>> R))
>>>>>       instead of return(list(A = A, C = C, Q = Q, R = R)) if I
>>>>> wanted the
>>>>>       list to be named with the object names.
>>>>>
>>>>> The implementation of this in R and C should be pretty
>>>>> straightforward.
>>>>> It would just require a modification to R CMD Check to recognize
>>>>> .[<- as
>>>>> assignment.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Sebastian
>>>>> -
>>>>> 2.)
>>>>>
>>>>> On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz
>>>>> <sebastian.krantz using graduateinstitute.ch
>>>>> <mailto:sebastian.krantz using graduateinstitute.ch>> wrote:
>>>>>
>>>>>       Thanks Gabriel and Kevin for your inputs,
>>>>>
>>>>>       regarding your points Gabriel, I think Python and Julia do
>>>>> allow
>>>>>       multiple sub-assignment, but in-line with my earlier
>>>>> suggestion in
>>>>>       response to Duncan to make multiple assignment an
>>>>> environment-level
>>>>>       operation (like collapse::%=% currently works),  this would
>>>>> not be
>>>>>       possible in R.
>>>>>
>>>>>       Regarding the [a] <- coolest_function() syntax, yeah it would
>>>>> mean
>>>>>       do multiple assignment and set a equal to the first element
>>>>> dropping
>>>>>       all other elements. Multiple assignment should be positional
>>>>> loke in
>>>>>       other languages, enabling flexible renaming of objects on the
>>>>> fly.
>>>>>       So it should be irrelevant whether the function returns a
>>>>> named or
>>>>>       unnamed list or vector.
>>>>>
>>>>>       Thanks also Kevin for this contribution. I think it’s a
>>>>> remarkable
>>>>>       effort, and I wouldn’t mind such semantics e.g. making it a
>>>>> function
>>>>>       call to ‘.[‘ or any other one-letter function, as long as
>>>>> it’s coded
>>>>>       in C and recognized by the interpreter as an assignment
>>>>> operation.
>>>>>
>>>>>       Best regards,
>>>>>
>>>>>       Sebastian
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>       On Sun 12. Mar 2023 at 01:00, Kevin Ushey
>>>>> <kevinushey using gmail.com
>>>>>       <mailto:kevinushey using gmail.com>> wrote:
>>>>>
>>>>>           FWIW, it's possible to get fairly close to your proposed
>>>> semantics
>>>>>           using the existing metaprogramming facilities in R. I put
>>>> together a
>>>>>           prototype package here to demonstrate:
>>>>>
>>>>>           https://github.com/kevinushey/dotty
>>>>>           <https://github.com/kevinushey/dotty>
>>>>>
>>>>>           The package exports an object called `.`, with a special
>>>>>           `[<-.dot` S3
>>>>>           method which enables destructuring assignments. This
>>>>> means you
>>>> can
>>>>>           write code like:
>>>>>
>>>>>                .[nr, nc] <- dim(mtcars)
>>>>>
>>>>>           and that will define 'nr' and 'nc' as you expect.
>>>>>
>>>>>           As for R CMD check warnings, you can suppress those
>>>>> through the
>>>>>           use of
>>>>>           globalVariables(), and that can also be automated within
>>>>> the
>>>>>           package.
>>>>>           The 'dotty' package includes a function 'dotify()' which
>>>> automates
>>>>>           looking for such usages in your package, and calling
>>>>>           globalVariables()
>>>>>           so that R CMD check doesn't warn. In theory, a similar
>>>>> technique
>>>>>           would
>>>>>           be applicable to other packages defining similar
>>>>> operators
>>>> (zeallot,
>>>>>           collapse).
>>>>>
>>>>>           Obviously, globalVariables() is a very heavy hammer to
>>>>> swing for
>>>>>           this
>>>>>           issue, but you might consider the benefits worth the
>>>>> tradeoffs.
>>>>>
>>>>>           Best,
>>>>>           Kevin
>>>>>
>>>>>           On Sat, Mar 11, 2023 at 2:53 PM Duncan Murdoch
>>>>>          
>>>>> <murdoch.duncan using gmail.com <mailto:murdoch.duncan using gmail.com>>
>>>> wrote:
>>>>>            >
>>>>>            > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz
>>>>> wrote:
>>>>>            > > Thanks Duncan and Ivan for the careful thoughts. I'm
>>>>> not
>>>>>           sure I can
>>>>>            > > follow all aspects you raised, but to give my
>>>>> limited take
>>>>>           on a few:
>>>>>            > >
>>>>>            > >> your proposal violates a very basic property of the
>>>>>           language, i.e. that all statements are expressions and
>>>>> have a
>>>>>           value.  > What's the value of 1 + (A, C =
>>>>> init_matrices()).
>>>>>            > >
>>>>>            > > I'm not sure I see the point here. I evaluated 1 +
>>>>> (d =
>>>>>           dim(mtcars); nr
>>>>>            > > = d[1]; nc = d[2]; rm(d)), which simply gives a
>>>>> syntax
>>>> error,
>>>>>            >
>>>>>            >
>>>>>            >    d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
>>>>>            >
>>>>>            > is not a statement, it is a sequence of 4 statements.
>>>>>            >
>>>>>            > Duncan Murdoch
>>>>>            >
>>>>>            >   as the
>>>>>            > > above expression should. `%=%` assigns to
>>>>>            > > environments, so 1 + (c("A", "C") %=%
>>>>> init_matrices())
>>>> returns
>>>>>            > > numeric(0), with A and C having their values
>>>>> assigned.
>>>>>            > >
>>>>>            > >> suppose f() returns list(A = 1, B = 2) and I do  >
>>>>> B, A <-
>>>>>           f() > Should assignment be by position or by name?
>>>>>            > >
>>>>>            > > In other languages this is by position. The feature
>>>>> is not
>>>>>           meant to
>>>>>            > > replace list2env(), and being able to rename objects
>>>>> in the
>>>>>           assignment
>>>>>            > > is a vital feature of codes
>>>>>            > > using multi input and output functions e.g. in
>>>>> Matlab or
>>>> Julia.
>>>>>            > >
>>>>>            > >> Honestly, given that this is simply syntactic
>>>>> sugar, I
>>>>>           don't think I would support it.
>>>>>            > >
>>>>>            > > You can call it that, but it would be used by almost
>>>>> every
>>>>>           R user almost
>>>>>            > > every day. Simple things like nr, nc = dim(x);
>>>>> values,
>>>>>           vectors =
>>>>>            > > eigen(x) etc. where the creation of intermediate
>>>>> objects
>>>>>            > > is cumbersome and redundant.
>>>>>            > >
>>>>>            > >> I see you've already mentioned it ("JavaScript-
>>>>> like"). I
>>>>>           think it would  fulfil Sebastian's requirements too, as
>>>>> long as
>>>>>           it is considered "true assignment" by the rest of the
>>>>> language.
>>>>>            > >
>>>>>            > > I don't have strong opinions about how the issue is
>>>>> phrased
>>>> or
>>>>>            > > implemented. Something like [t, n] = dim(x) might
>>>>> even be
>>>>>           more clear.
>>>>>            > > It's important though that assignment remains by
>>>>> position,
>>>>>            > > so even if some output gets thrown away that should
>>>>> also be
>>>>>           positional.
>>>>>            > >
>>>>>            > >>  A <- 0  > [A, B = A + 10] <- list(1, A = 2)
>>>>>            > >
>>>>>            > > I also fail to see the use of allowing this.
>>>>> something like
>>>>>           this is an
>>>>>            > > error.
>>>>>            > >
>>>>>            > >> A = 2
>>>>>            > >> (B = A + 1) <- 1
>>>>>            > > Error in (B = A + 1) <- 1 : could not find function
>>>>> "(<-"
>>>>>            > >
>>>>>            > > Regarding the practical implementation, I think
>>>>>           `collapse::%=%` is a
>>>>>            > > good starting point. It could be introduced in R as
>>>>> a
>>>>>           separate function,
>>>>>            > > or `=` could be modified to accommodate its
>>>>> capability. It
>>>>>           should be
>>>>>            > > clear that
>>>>>            > > with more than one LHS variables the assignment is
>>>>> an
>>>>>           environment level
>>>>>            > > operation and the results can only be used in
>>>>> computations
>>>>>           once assigned
>>>>>            > > to the environment, e.g. as in 1 + (c("A", "C") %=%
>>>>>           init_matrices()),
>>>>>            > > A and C are not available for the addition in this
>>>>>           statement. The
>>>>>            > > interpretor then needs to be modified to read
>>>>> something
>>>>>           like nr, nc =
>>>>>            > > dim(x) or [nr, nc] = dim(x). as an environment-level
>>>>>           multiple assignment
>>>>>            > > operation with no
>>>>>            > > immediate value. Appears very feasible to my limited
>>>>>           understanding, but
>>>>>            > > I guess there are other things to consider still.
>>>>>           Definitely appreciate
>>>>>            > > the responses so far though.
>>>>>            > >
>>>>>            > > Best regards,
>>>>>            > >
>>>>>            > > Sebastian
>>>>>            > >
>>>>>            > >
>>>>>            > >
>>>>>            > >
>>>>>            > >
>>>>>            > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch
>>>>>          
>>>>> <murdoch.duncan using gmail.com <mailto:murdoch.duncan using gmail.com>
>>>>>            > > <mailto:murdoch.duncan using gmail.com
>>>>>           <mailto:murdoch.duncan using gmail.com>>> wrote:
>>>>>            > >
>>>>>            > >     On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
>>>>>            > >      > On Sat, 11 Mar 2023 11:11:06 -0500
>>>>>            > >      > Duncan Murdoch <murdoch.duncan using gmail.com
>>>>>           <mailto:murdoch.duncan using gmail.com>
>>>>>            > >     <mailto:murdoch.duncan using gmail.com
>>>>>           <mailto:murdoch.duncan using gmail.com>>> wrote:
>>>>>            > >      >
>>>>>            > >      >> That's clear, but your proposal violates a
>>>>> very
>>>>>           basic property
>>>>>            > >     of the
>>>>>            > >      >> language, i.e. that all statements are
>>>>> expressions
>>>>>           and have a value.
>>>>>            > >      >
>>>>>            > >      > How about reframing this feature request from
>>>>>           multiple assignment
>>>>>            > >      > (which does go contrary to "everything has
>>>>> only one
>>>>>           value, even
>>>>>            > >     if it's
>>>>>            > >      > sometimes invisible(NULL)") to "structured
>>>>> binding"
>>>>>           / "destructuring
>>>>>            > >      > assignment" [*], which takes this single
>>>>> single
>>>>>           value returned by the
>>>>>            > >      > expression and subsets it subject to certain
>>>>> rules?
>>>>>           It may be
>>>>>            > >     easier to
>>>>>            > >      > make a decision on the semantics for
>>>>> destructuring
>>>>>           assignment (e.g.
>>>>>            > >      > languages which have this feature typically
>>>>> allow
>>>>>           throwing unneeded
>>>>>            > >      > parts of the return value away), and it
>>>>> doesn't seem
>>>>>           to break as much
>>>>>            > >      > of the rest of the language if implemented.
>>>>>            > >      >
>>>>>            > >      > I see you've already mentioned it
>>>>>           ("JavaScript-like"). I think it
>>>>>            > >     would
>>>>>            > >      > fulfil Sebastian's requirements too, as long
>>>>> as it
>>>>>           is considered
>>>>>            > >     "true
>>>>>            > >      > assignment" by the rest of the language.
>>>>>            > >      >
>>>>>            > >      > The hard part is to propose the actual
>>>>> grammar of
>>>>>           the new feature (in
>>>>>            > >      > terms of src/main/gram.y, preferably without
>>>> introducing
>>>>>            > >     conflicts) and
>>>>>            > >      > its semantics (including the corner cases,
>>>>> some of
>>>>>           which you have
>>>>>            > >      > already mentioned). I'm not sure I'm up to
>>>>> the task.
>>>>>            > >      >
>>>>>            > >
>>>>>            > >     If I were doing it, here's what I'd propose:
>>>>>            > >
>>>>>            > >         '[' formlist ']' LEFT_ASSIGN expr
>>>>>            > >         '[' formlist ']' EQ_ASSIGN expr
>>>>>            > >         expr RIGHT_ASSIGN  '[' formlist ']'
>>>>>            > >
>>>>>            > >     where `formlist` has the syntax of the formals
>>>>> list for
>>>>>           a function
>>>>>            > >     definition.  This would have the following
>>>>> semantics:
>>>>>            > >
>>>>>            > >          {
>>>>>            > >            *tmp* <- expr
>>>>>            > >
>>>>>            > >            # For arguments with no "default"
>>>>> expression,
>>>>>            > >
>>>>>            > >            argname1 <- *tmp*[[1]]
>>>>>            > >            argname2 <- *tmp*[[2]]
>>>>>            > >            ...
>>>>>            > >
>>>>>            > >            # For arguments with a default listed
>>>>>            > >
>>>>>            > >            argname3 <- with(*tmp*, default3)
>>>>>            > >          }
>>>>>            > >
>>>>>            > >
>>>>>            > >     The value of the whole thing would therefore be
>>>>>           (invisibly) the
>>>>>            > >     value of
>>>>>            > >     the last item in the assignment.
>>>>>            > >
>>>>>            > >     Two examples:
>>>>>            > >
>>>>>            > >         [A, B, C] <- expr   # assign the first three
>>>>>           elements of expr to A,
>>>>>            > >     B, and C
>>>>>            > >
>>>>>            > >         [A, B, C = a + b] <- expr  # assign the
>>>>> first two
>>>>>           elements of expr
>>>>>            > >                                    # to A and B,
>>>>>            > >                                    # assign
>>>>> with(expr, a +
>>>>>           b) to C.
>>>>>            > >
>>>>>            > >     Unfortunately, I don't think this could be done
>>>> entirely by
>>>>>            > >     transforming
>>>>>            > >     the expression (which is the way |> was done),
>>>>> and that
>>>>>           makes it a lot
>>>>>            > >     harder to write and to reason about.  E.g. what
>>>>> does
>>>>>           this do?
>>>>>            > >
>>>>>            > >         A <- 0
>>>>>            > >         [A, B = A + 10] <- list(1, A = 2)
>>>>>            > >
>>>>>            > >     According to the recipe above, I think it sets A
>>>>> to 1
>>>>>           and B to 12, but
>>>>>            > >     maybe a user would expect B to be 10 or 11.  And
>>>>>           according to that
>>>>>            > >     recipe this is an error:
>>>>>            > >
>>>>>            > >         [A, B = A + 10] <- c(1, A = 2)
>>>>>            > >
>>>>>            > >     which probably isn't what a user would expect,
>>>>> given
>>>>>           that this is fine:
>>>>>            > >
>>>>>            > >         [A, B] <- c(1, 2)
>>>>>            > >
>>>>>            > >     Duncan Murdoch
>>>>>            > >
>>>>>            >
>>>>>            > ______________________________________________
>>>>>            > R-devel using r-project.org <mailto:R-devel using r-project.org>
>>>>> mailing
>>>> list
>>>>>            > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>           <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>>>>
>>>>
>>>>
>>>
>>>           [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>