[Rd] Multiple Assignment built into the R Interpreter?

Sebastian Martin Krantz @eb@@t|@n@kr@ntz @end|ng |rom gr@du@te|n@t|tute@ch
Sun Mar 12 13:05:53 CET 2023


Kevins package is very nice as a proof of concept, no doubt about that, but
it is not at the level of performance or convenience that a native R
implementation would offer. I would probably not use it to translate matlab
routines into R packages placed on CRAN, because it’s an additional
dependency, I have a performance burden in every iteration, and
utils::globalVariables() is everything but elegant. From that perspective
it would be more convenient for me right now to stick with collapse::%=%,
which is already written in C, and also call
utils::globalVariables().

But again my hope in starting this was that R Core might see that the
addition of multiple assignment would be a significant enhancement to the
language, of the same order as the base pipe |> in my opinion.

I think the discussion so far has at least brought forth a way to implement
this in a way that does not violate fundamental principles of the language.
Which could form a basis for thinking about an actual addition to the
language.

Best regards,

Sebastian


On Sun 12. Mar 2023 at 13:18, Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:
> > Thinking more about this, and seeing Kevins examples at
> > https://github.com/kevinushey/dotty
> > <https://github.com/kevinushey/dotty>, I think this is the most R-like
> > way of doing it,
> > with an additional benefit as it would allow to introduce the useful
> > data.table semantics DT[, .(a = b, c, d)] to more general R. So I would
> > propose to
> > introduce a new primitive function . <- function(...) .Primitive(".") in
> > R with an assignment method and the following features:
>
> I think that proposal is very unlikely to be accepted.  If it was a
> primitive function, it could only be maintained by R Core.  They are
> justifiably very reluctant to take on extra work for themselves.
>
> Kevin's package demonstrates that this can be done entirely in a
> contributed package, which means there's no need for R Core to be
> involved.  I don't know if he has plans to turn his prototype into a
> CRAN package.  If he doesn't, then it will be up to some other
> interested maintainer to step up and take on the task, or it will just
> fade away.
>
> I haven't checked whether your proposals below represent changes from
> the current version of dotty, but if they do, the way to proceed is to
> fork that project, implement your changes, and offer to contribute them
> back to the main branch.
>
> Duncan Murdoch
>
>
>
> >
> >   * Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment
> >     e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All
> >     the functionality proposed by Kevin at
> >     https://github.com/kevinushey/dotty
> >     <https://github.com/kevinushey/dotty> is useful, unambiguous and
> >     feasible.
> >   * Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars.
> >   * Mixing of positional and named assignment e.g .[mpg_new, carb_new =
> >     carb, cyl_new] <- mtcars. The inputs not assigned by name are simply
> >     the elements of RHS in the order they occur, regardless of whether
> >     they have been used previously e.g. .[mpg_new, cyl_new = cyl,
> >     log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could
> >     be any named vector type.
> >   * Conventional use of the function as lazy version of of list(), as in
> >     data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D).
> >     This would also be useful, allowing more parsimonious code, and
> >     avoid the need to assign names to all return values in a function
> >     return, e.g. if I already have matrices A, C, Q and R as internal
> >     objects in my function, I can simply end by return(.(A, C, Q, R))
> >     instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the
> >     list to be named with the object names.
> >
> > The implementation of this in R and C should be pretty straightforward.
> > It would just require a modification to R CMD Check to recognize .[<- as
> > assignment.
> >
> > Best regards,
> >
> > Sebastian
> > -
> > 2.)
> >
> > On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz
> > <sebastian.krantz using graduateinstitute.ch
> > <mailto:sebastian.krantz using graduateinstitute.ch>> wrote:
> >
> >     Thanks Gabriel and Kevin for your inputs,
> >
> >     regarding your points Gabriel, I think Python and Julia do allow
> >     multiple sub-assignment, but in-line with my earlier suggestion in
> >     response to Duncan to make multiple assignment an environment-level
> >     operation (like collapse::%=% currently works),  this would not be
> >     possible in R.
> >
> >     Regarding the [a] <- coolest_function() syntax, yeah it would mean
> >     do multiple assignment and set a equal to the first element dropping
> >     all other elements. Multiple assignment should be positional loke in
> >     other languages, enabling flexible renaming of objects on the fly.
> >     So it should be irrelevant whether the function returns a named or
> >     unnamed list or vector.
> >
> >     Thanks also Kevin for this contribution. I think it’s a remarkable
> >     effort, and I wouldn’t mind such semantics e.g. making it a function
> >     call to ‘.[‘ or any other one-letter function, as long as it’s coded
> >     in C and recognized by the interpreter as an assignment operation.
> >
> >     Best regards,
> >
> >     Sebastian
> >
> >
> >
> >
> >
> >     On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinushey using gmail.com
> >     <mailto:kevinushey using gmail.com>> wrote:
> >
> >         FWIW, it's possible to get fairly close to your proposed
> semantics
> >         using the existing metaprogramming facilities in R. I put
> together a
> >         prototype package here to demonstrate:
> >
> >         https://github.com/kevinushey/dotty
> >         <https://github.com/kevinushey/dotty>
> >
> >         The package exports an object called `.`, with a special
> >         `[<-.dot` S3
> >         method which enables destructuring assignments. This means you
> can
> >         write code like:
> >
> >              .[nr, nc] <- dim(mtcars)
> >
> >         and that will define 'nr' and 'nc' as you expect.
> >
> >         As for R CMD check warnings, you can suppress those through the
> >         use of
> >         globalVariables(), and that can also be automated within the
> >         package.
> >         The 'dotty' package includes a function 'dotify()' which
> automates
> >         looking for such usages in your package, and calling
> >         globalVariables()
> >         so that R CMD check doesn't warn. In theory, a similar technique
> >         would
> >         be applicable to other packages defining similar operators
> (zeallot,
> >         collapse).
> >
> >         Obviously, globalVariables() is a very heavy hammer to swing for
> >         this
> >         issue, but you might consider the benefits worth the tradeoffs.
> >
> >         Best,
> >         Kevin
> >
> >         On Sat, Mar 11, 2023 at 2:53 PM Duncan Murdoch
> >         <murdoch.duncan using gmail.com <mailto:murdoch.duncan using gmail.com>>
> wrote:
> >          >
> >          > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
> >          > > Thanks Duncan and Ivan for the careful thoughts. I'm not
> >         sure I can
> >          > > follow all aspects you raised, but to give my limited take
> >         on a few:
> >          > >
> >          > >> your proposal violates a very basic property of the
> >         language, i.e. that all statements are expressions and have a
> >         value.  > What's the value of 1 + (A, C = init_matrices()).
> >          > >
> >          > > I'm not sure I see the point here. I evaluated 1 + (d =
> >         dim(mtcars); nr
> >          > > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax
> error,
> >          >
> >          >
> >          >    d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
> >          >
> >          > is not a statement, it is a sequence of 4 statements.
> >          >
> >          > Duncan Murdoch
> >          >
> >          >   as the
> >          > > above expression should. `%=%` assigns to
> >          > > environments, so 1 + (c("A", "C") %=% init_matrices())
> returns
> >          > > numeric(0), with A and C having their values assigned.
> >          > >
> >          > >> suppose f() returns list(A = 1, B = 2) and I do  > B, A <-
> >         f() > Should assignment be by position or by name?
> >          > >
> >          > > In other languages this is by position. The feature is not
> >         meant to
> >          > > replace list2env(), and being able to rename objects in the
> >         assignment
> >          > > is a vital feature of codes
> >          > > using multi input and output functions e.g. in Matlab or
> Julia.
> >          > >
> >          > >> Honestly, given that this is simply syntactic sugar, I
> >         don't think I would support it.
> >          > >
> >          > > You can call it that, but it would be used by almost every
> >         R user almost
> >          > > every day. Simple things like nr, nc = dim(x); values,
> >         vectors =
> >          > > eigen(x) etc. where the creation of intermediate objects
> >          > > is cumbersome and redundant.
> >          > >
> >          > >> I see you've already mentioned it ("JavaScript-like"). I
> >         think it would  fulfil Sebastian's requirements too, as long as
> >         it is considered "true assignment" by the rest of the language.
> >          > >
> >          > > I don't have strong opinions about how the issue is phrased
> or
> >          > > implemented. Something like [t, n] = dim(x) might even be
> >         more clear.
> >          > > It's important though that assignment remains by position,
> >          > > so even if some output gets thrown away that should also be
> >         positional.
> >          > >
> >          > >>  A <- 0  > [A, B = A + 10] <- list(1, A = 2)
> >          > >
> >          > > I also fail to see the use of allowing this. something like
> >         this is an
> >          > > error.
> >          > >
> >          > >> A = 2
> >          > >> (B = A + 1) <- 1
> >          > > Error in (B = A + 1) <- 1 : could not find function "(<-"
> >          > >
> >          > > Regarding the practical implementation, I think
> >         `collapse::%=%` is a
> >          > > good starting point. It could be introduced in R as a
> >         separate function,
> >          > > or `=` could be modified to accommodate its capability. It
> >         should be
> >          > > clear that
> >          > > with more than one LHS variables the assignment is an
> >         environment level
> >          > > operation and the results can only be used in computations
> >         once assigned
> >          > > to the environment, e.g. as in 1 + (c("A", "C") %=%
> >         init_matrices()),
> >          > > A and C are not available for the addition in this
> >         statement. The
> >          > > interpretor then needs to be modified to read something
> >         like nr, nc =
> >          > > dim(x) or [nr, nc] = dim(x). as an environment-level
> >         multiple assignment
> >          > > operation with no
> >          > > immediate value. Appears very feasible to my limited
> >         understanding, but
> >          > > I guess there are other things to consider still.
> >         Definitely appreciate
> >          > > the responses so far though.
> >          > >
> >          > > Best regards,
> >          > >
> >          > > Sebastian
> >          > >
> >          > >
> >          > >
> >          > >
> >          > >
> >          > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch
> >         <murdoch.duncan using gmail.com <mailto:murdoch.duncan using gmail.com>
> >          > > <mailto:murdoch.duncan using gmail.com
> >         <mailto:murdoch.duncan using gmail.com>>> wrote:
> >          > >
> >          > >     On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> >          > >      > On Sat, 11 Mar 2023 11:11:06 -0500
> >          > >      > Duncan Murdoch <murdoch.duncan using gmail.com
> >         <mailto:murdoch.duncan using gmail.com>
> >          > >     <mailto:murdoch.duncan using gmail.com
> >         <mailto:murdoch.duncan using gmail.com>>> wrote:
> >          > >      >
> >          > >      >> That's clear, but your proposal violates a very
> >         basic property
> >          > >     of the
> >          > >      >> language, i.e. that all statements are expressions
> >         and have a value.
> >          > >      >
> >          > >      > How about reframing this feature request from
> >         multiple assignment
> >          > >      > (which does go contrary to "everything has only one
> >         value, even
> >          > >     if it's
> >          > >      > sometimes invisible(NULL)") to "structured binding"
> >         / "destructuring
> >          > >      > assignment" [*], which takes this single single
> >         value returned by the
> >          > >      > expression and subsets it subject to certain rules?
> >         It may be
> >          > >     easier to
> >          > >      > make a decision on the semantics for destructuring
> >         assignment (e.g.
> >          > >      > languages which have this feature typically allow
> >         throwing unneeded
> >          > >      > parts of the return value away), and it doesn't seem
> >         to break as much
> >          > >      > of the rest of the language if implemented.
> >          > >      >
> >          > >      > I see you've already mentioned it
> >         ("JavaScript-like"). I think it
> >          > >     would
> >          > >      > fulfil Sebastian's requirements too, as long as it
> >         is considered
> >          > >     "true
> >          > >      > assignment" by the rest of the language.
> >          > >      >
> >          > >      > The hard part is to propose the actual grammar of
> >         the new feature (in
> >          > >      > terms of src/main/gram.y, preferably without
> introducing
> >          > >     conflicts) and
> >          > >      > its semantics (including the corner cases, some of
> >         which you have
> >          > >      > already mentioned). I'm not sure I'm up to the task.
> >          > >      >
> >          > >
> >          > >     If I were doing it, here's what I'd propose:
> >          > >
> >          > >         '[' formlist ']' LEFT_ASSIGN expr
> >          > >         '[' formlist ']' EQ_ASSIGN expr
> >          > >         expr RIGHT_ASSIGN  '[' formlist ']'
> >          > >
> >          > >     where `formlist` has the syntax of the formals list for
> >         a function
> >          > >     definition.  This would have the following semantics:
> >          > >
> >          > >          {
> >          > >            *tmp* <- expr
> >          > >
> >          > >            # For arguments with no "default" expression,
> >          > >
> >          > >            argname1 <- *tmp*[[1]]
> >          > >            argname2 <- *tmp*[[2]]
> >          > >            ...
> >          > >
> >          > >            # For arguments with a default listed
> >          > >
> >          > >            argname3 <- with(*tmp*, default3)
> >          > >          }
> >          > >
> >          > >
> >          > >     The value of the whole thing would therefore be
> >         (invisibly) the
> >          > >     value of
> >          > >     the last item in the assignment.
> >          > >
> >          > >     Two examples:
> >          > >
> >          > >         [A, B, C] <- expr   # assign the first three
> >         elements of expr to A,
> >          > >     B, and C
> >          > >
> >          > >         [A, B, C = a + b] <- expr  # assign the first two
> >         elements of expr
> >          > >                                    # to A and B,
> >          > >                                    # assign with(expr, a +
> >         b) to C.
> >          > >
> >          > >     Unfortunately, I don't think this could be done
> entirely by
> >          > >     transforming
> >          > >     the expression (which is the way |> was done), and that
> >         makes it a lot
> >          > >     harder to write and to reason about.  E.g. what does
> >         this do?
> >          > >
> >          > >         A <- 0
> >          > >         [A, B = A + 10] <- list(1, A = 2)
> >          > >
> >          > >     According to the recipe above, I think it sets A to 1
> >         and B to 12, but
> >          > >     maybe a user would expect B to be 10 or 11.  And
> >         according to that
> >          > >     recipe this is an error:
> >          > >
> >          > >         [A, B = A + 10] <- c(1, A = 2)
> >          > >
> >          > >     which probably isn't what a user would expect, given
> >         that this is fine:
> >          > >
> >          > >         [A, B] <- c(1, 2)
> >          > >
> >          > >     Duncan Murdoch
> >          > >
> >          >
> >          > ______________________________________________
> >          > R-devel using r-project.org <mailto:R-devel using r-project.org> mailing
> list
> >          > https://stat.ethz.ch/mailman/listinfo/r-devel
> >         <https://stat.ethz.ch/mailman/listinfo/r-devel>
> >
>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list