[Rd] Multiple Assignment built into the R Interpreter?

Sebastian Martin Krantz @eb@@t|@n@kr@ntz @end|ng |rom gr@du@te|n@t|tute@ch
Sun Mar 12 11:07:22 CET 2023


Thinking more about this, and seeing Kevins examples at
https://github.com/kevinushey/dotty, I think this is the most R-like way of
doing it,
with an additional benefit as it would allow to introduce the useful
data.table semantics DT[, .(a = b, c, d)] to more general R. So I would
propose to
introduce a new primitive function . <- function(...) .Primitive(".") in R
with an assignment method and the following features:

   - Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment
   e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All the
   functionality proposed by Kevin at https://github.com/kevinushey/dotty
   is useful, unambiguous and feasible.
   - Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars.
   - Mixing of positional and named assignment e.g .[mpg_new, carb_new =
   carb, cyl_new] <- mtcars. The inputs not assigned by name are simply the
   elements of RHS in the order they occur, regardless of whether they have
   been used previously e.g. .[mpg_new, cyl_new = cyl, log_cyl = log(cyl),
   cyl_new2] <- mtcars is feasible. RHS here could be any named vector type.
   - Conventional use of the function as lazy version of of list(), as in
   data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D). This
   would also be useful, allowing more parsimonious code, and avoid the need
   to assign names to all return values in a function return, e.g. if I
   already have matrices A, C, Q and R as internal objects in my function, I
   can simply end by return(.(A, C, Q, R)) instead of return(list(A = A, C =
   C, Q = Q, R = R)) if I wanted the list to be named with the object names.

The implementation of this in R and C should be pretty straightforward. It
would just require a modification to R CMD Check to recognize .[<- as
assignment.

Best regards,

Sebastian
-
2.)

On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz <
sebastian.krantz using graduateinstitute.ch> wrote:

> Thanks Gabriel and Kevin for your inputs,
>
> regarding your points Gabriel, I think Python and Julia do allow multiple
> sub-assignment, but in-line with my earlier suggestion in response to
> Duncan to make multiple assignment an environment-level operation (like
> collapse::%=% currently works),  this would not be possible in R.
>
> Regarding the [a] <- coolest_function() syntax, yeah it would mean do
> multiple assignment and set a equal to the first element dropping all other
> elements. Multiple assignment should be positional loke in other languages,
> enabling flexible renaming of objects on the fly. So it should be
> irrelevant whether the function returns a named or unnamed list or vector.
>
> Thanks also Kevin for this contribution. I think it’s a remarkable effort,
> and I wouldn’t mind such semantics e.g. making it a function call to ‘.[‘
> or any other one-letter function, as long as it’s coded in C and recognized
> by the interpreter as an assignment operation.
>
> Best regards,
>
> Sebastian
>
>
>
>
>
> On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinushey using gmail.com> wrote:
>
>> FWIW, it's possible to get fairly close to your proposed semantics
>> using the existing metaprogramming facilities in R. I put together a
>> prototype package here to demonstrate:
>>
>>     https://github.com/kevinushey/dotty
>>
>> The package exports an object called `.`, with a special `[<-.dot` S3
>> method which enables destructuring assignments. This means you can
>> write code like:
>>
>>     .[nr, nc] <- dim(mtcars)
>>
>> and that will define 'nr' and 'nc' as you expect.
>>
>> As for R CMD check warnings, you can suppress those through the use of
>> globalVariables(), and that can also be automated within the package.
>> The 'dotty' package includes a function 'dotify()' which automates
>> looking for such usages in your package, and calling globalVariables()
>> so that R CMD check doesn't warn. In theory, a similar technique would
>> be applicable to other packages defining similar operators (zeallot,
>> collapse).
>>
>> Obviously, globalVariables() is a very heavy hammer to swing for this
>> issue, but you might consider the benefits worth the tradeoffs.
>>
>> Best,
>> Kevin
>>
>> On Sat, Mar 11, 2023 at 2:53 PM Duncan Murdoch <murdoch.duncan using gmail.com>
>> wrote:
>> >
>> > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
>> > > Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can
>> > > follow all aspects you raised, but to give my limited take on a few:
>> > >
>> > >> your proposal violates a very basic property of the  language, i.e.
>> that all statements are expressions and have a value.  > What's the value
>> of 1 + (A, C = init_matrices()).
>> > >
>> > > I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars);
>> nr
>> > > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
>> >
>> >
>> >    d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
>> >
>> > is not a statement, it is a sequence of 4 statements.
>> >
>> > Duncan Murdoch
>> >
>> >   as the
>> > > above expression should. `%=%` assigns to
>> > > environments, so 1 + (c("A", "C") %=% init_matrices()) returns
>> > > numeric(0), with A and C having their values assigned.
>> > >
>> > >> suppose f() returns list(A = 1, B = 2) and I do  > B, A <- f() >
>> Should assignment be by position or by name?
>> > >
>> > > In other languages this is by position. The feature is not meant to
>> > > replace list2env(), and being able to rename objects in the assignment
>> > > is a vital feature of codes
>> > > using multi input and output functions e.g. in Matlab or Julia.
>> > >
>> > >> Honestly, given that this is simply syntactic sugar, I don't think I
>> would support it.
>> > >
>> > > You can call it that, but it would be used by almost every R user
>> almost
>> > > every day. Simple things like nr, nc = dim(x); values, vectors =
>> > > eigen(x) etc. where the creation of intermediate objects
>> > > is cumbersome and redundant.
>> > >
>> > >> I see you've already mentioned it ("JavaScript-like"). I think it
>> would  fulfil Sebastian's requirements too, as long as it is considered
>> "true assignment" by the rest of the language.
>> > >
>> > > I don't have strong opinions about how the issue is phrased or
>> > > implemented. Something like [t, n] = dim(x) might even be more clear.
>> > > It's important though that assignment remains by position,
>> > > so even if some output gets thrown away that should also be
>> positional.
>> > >
>> > >>  A <- 0  > [A, B = A + 10] <- list(1, A = 2)
>> > >
>> > > I also fail to see the use of allowing this. something like this is an
>> > > error.
>> > >
>> > >> A = 2
>> > >> (B = A + 1) <- 1
>> > > Error in (B = A + 1) <- 1 : could not find function "(<-"
>> > >
>> > > Regarding the practical implementation, I think `collapse::%=%` is a
>> > > good starting point. It could be introduced in R as a separate
>> function,
>> > > or `=` could be modified to accommodate its capability. It should be
>> > > clear that
>> > > with more than one LHS variables the assignment is an environment
>> level
>> > > operation and the results can only be used in computations once
>> assigned
>> > > to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
>> > > A and C are not available for the addition in this statement. The
>> > > interpretor then needs to be modified to read something like nr, nc =
>> > > dim(x) or [nr, nc] = dim(x). as an environment-level multiple
>> assignment
>> > > operation with no
>> > > immediate value. Appears very feasible to my limited understanding,
>> but
>> > > I guess there are other things to consider still. Definitely
>> appreciate
>> > > the responses so far though.
>> > >
>> > > Best regards,
>> > >
>> > > Sebastian
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <
>> murdoch.duncan using gmail.com
>> > > <mailto:murdoch.duncan using gmail.com>> wrote:
>> > >
>> > >     On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
>> > >      > On Sat, 11 Mar 2023 11:11:06 -0500
>> > >      > Duncan Murdoch <murdoch.duncan using gmail.com
>> > >     <mailto:murdoch.duncan using gmail.com>> wrote:
>> > >      >
>> > >      >> That's clear, but your proposal violates a very basic property
>> > >     of the
>> > >      >> language, i.e. that all statements are expressions and have a
>> value.
>> > >      >
>> > >      > How about reframing this feature request from multiple
>> assignment
>> > >      > (which does go contrary to "everything has only one value, even
>> > >     if it's
>> > >      > sometimes invisible(NULL)") to "structured binding" /
>> "destructuring
>> > >      > assignment" [*], which takes this single single value returned
>> by the
>> > >      > expression and subsets it subject to certain rules? It may be
>> > >     easier to
>> > >      > make a decision on the semantics for destructuring assignment
>> (e.g.
>> > >      > languages which have this feature typically allow throwing
>> unneeded
>> > >      > parts of the return value away), and it doesn't seem to break
>> as much
>> > >      > of the rest of the language if implemented.
>> > >      >
>> > >      > I see you've already mentioned it ("JavaScript-like"). I think
>> it
>> > >     would
>> > >      > fulfil Sebastian's requirements too, as long as it is
>> considered
>> > >     "true
>> > >      > assignment" by the rest of the language.
>> > >      >
>> > >      > The hard part is to propose the actual grammar of the new
>> feature (in
>> > >      > terms of src/main/gram.y, preferably without introducing
>> > >     conflicts) and
>> > >      > its semantics (including the corner cases, some of which you
>> have
>> > >      > already mentioned). I'm not sure I'm up to the task.
>> > >      >
>> > >
>> > >     If I were doing it, here's what I'd propose:
>> > >
>> > >         '[' formlist ']' LEFT_ASSIGN expr
>> > >         '[' formlist ']' EQ_ASSIGN expr
>> > >         expr RIGHT_ASSIGN  '[' formlist ']'
>> > >
>> > >     where `formlist` has the syntax of the formals list for a function
>> > >     definition.  This would have the following semantics:
>> > >
>> > >          {
>> > >            *tmp* <- expr
>> > >
>> > >            # For arguments with no "default" expression,
>> > >
>> > >            argname1 <- *tmp*[[1]]
>> > >            argname2 <- *tmp*[[2]]
>> > >            ...
>> > >
>> > >            # For arguments with a default listed
>> > >
>> > >            argname3 <- with(*tmp*, default3)
>> > >          }
>> > >
>> > >
>> > >     The value of the whole thing would therefore be (invisibly) the
>> > >     value of
>> > >     the last item in the assignment.
>> > >
>> > >     Two examples:
>> > >
>> > >         [A, B, C] <- expr   # assign the first three elements of expr
>> to A,
>> > >     B, and C
>> > >
>> > >         [A, B, C = a + b] <- expr  # assign the first two elements of
>> expr
>> > >                                    # to A and B,
>> > >                                    # assign with(expr, a + b) to C.
>> > >
>> > >     Unfortunately, I don't think this could be done entirely by
>> > >     transforming
>> > >     the expression (which is the way |> was done), and that makes it
>> a lot
>> > >     harder to write and to reason about.  E.g. what does this do?
>> > >
>> > >         A <- 0
>> > >         [A, B = A + 10] <- list(1, A = 2)
>> > >
>> > >     According to the recipe above, I think it sets A to 1 and B to
>> 12, but
>> > >     maybe a user would expect B to be 10 or 11.  And according to that
>> > >     recipe this is an error:
>> > >
>> > >         [A, B = A + 10] <- c(1, A = 2)
>> > >
>> > >     which probably isn't what a user would expect, given that this is
>> fine:
>> > >
>> > >         [A, B] <- c(1, 2)
>> > >
>> > >     Duncan Murdoch
>> > >
>> >
>> > ______________________________________________
>> > R-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list