[Rd] [External] Warning with new placeholder piped to data.frame extractors `[` and `[[`.

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Thu Jul 21 03:16:58 CEST 2022


On Wed, 20 Jul 2022, Rui Barradas wrote:

> Hello,
>
> I agree with several points you've made.
>
> The code of the data.frame methods for `[` and `[[` is already complicated 
> enough and a revision is probably not worth the effort, constructs like 
> piping to `[` and `[[` is not furthering the cause of readability and a new 
> base R dplyr::pull like function would put an extra development and 
> maintenace burden on the R Core Team, to which we are in great debt for their 
> excellent and already difficult and time consuming work developing, 
> maintaining and making R evolve along the years.
>
> My question, if the named argument syntax is mandatory then it should not 
> throw a warning, seems to have raised a consensus that this use of the new 
> pipe operator and placeholder should be discouraged (Toby), considered a bug 
> (Gabriel) or maybe intentional (Duncan). Definitely an unclear idiom to be 
> avoided and not a priority.
>
> I still find it strange but if R is telling the programmer to write better 
> code then follow the advice.
>
> (As a side note, all of the following work as expected:
>
> 1:6 |> `[`(x = _, 2)
> 1:6 |> `[[`(x = _, 2)

Depends on what you expext. This is probably not what you expect:

     > `[`(2, x = 1:6)
     [1]  2 NA NA NA NA NA

For  performance reasons many primitives were implemented to
not do argument matching on named arguments but to accept arguments by
position. This is particularly true for syntactically special
functions like arithmetic and extraction operators. You can use named
arguments in these, but the names are ignored by the default methods,
which just go by position. S3 methods implemented as R functions
usually will handle the named arguments in the usual way, but can
choose not to, as the data.frame extraction methods do.

Arguably the performance issue is now moot as almost all
performance-critical code will be byte compiled. But adding argument
matching in all primitives is not something I can see getting high
priority at the moment.

As far as I can see, it looks like dropping the warning for a named
'x' argument in the S3 extraction methods for data.frame would be
fairly straightforward and shouldn't cause any disruption. But this
wouldn't make it into a release until the placeholder is allowed at
the head of an extraction chain, assuming we go there.

Best,

luke
>
> matrix(1:6, nrow = 3) |> `[`(x = _, 2, 2)
> matrix(1:6, nrow = 3) |> `[`(x = _, 2, )
> matrix(1:6, nrow = 3) |> `[`(x = _, , 2)
>
> list(1:6, b = 7:10) |> `[`(x = _, 2)
> list(1:6, b = 7:10) |> `[[`(x = _, 2)
> list(1:6, b = 7:10) |> `$`(x = _, 'b')
>
> So this is specific to the data.frame methods.)
>
> Hope this helps,
>
> Rui Barradas
>
> Às 23:44 de 18/07/2022, luke-tierney using uiowa.edu escreveu:
>> On Sat, 16 Jul 2022, Rui Barradas wrote:
>> 
>>> Hello,
>>> 
>>> When piping to any of `[.data.frame` or `[[.data.frame`, the placeholder 
>>> in mandatory.
>>> 
>>> 
>>> df1 <- data.frame(y = 1:10, f = rep(c("a", "b"), each = 5))
>>> 
>>> aggregate(y ~ f, df1, mean) |> `[`('y')
>>> # Error: function '[' not supported in RHS call of a pipe
>>> 
>>> aggregate(y ~ f, df1, mean) |> `[[`('y')
>>> # Error: function '[' not supported in RHS call of a pipe
>>> 
>>> 
>>> 
>>> But if used it throws a warning.
>>> 
>>> 
>>> 
>>> aggregate(y ~ f, df1, mean) |> `[`(x = _, 'y')
>>> #  Warning in `[.data.frame`(x = aggregate(y ~ f, df1, mean), "y"): named 
>>> arguments
>>> #  other than 'drop' are discouraged
>>> #    y
>>> #  1 3
>>> #  2 8
>>> 
>>> aggregate(y ~ f, df1, mean) |> `[[`(x = _, 'y')
>>> #  Warning in `[[.data.frame`(x = aggregate(y ~ f, df1, mean), "y"): named
>>> #  arguments other than 'exact' are discouraged
>>> #  [1] 3 8
>>> 
>> 
>> The pipe syntax requirs that the placeolder be used as a named
>> argument.  If you do that, then the syntax is legal and parses
>> successfully.
>> 
>>> Hasn't this become inconsistent behavior?
>>> More than merely right, the named argument is mandatory, it shouldn't give 
>>> warnings.
>> 
>> Any R function can decide whether it wants to allow explicitly named
>> arguments.  Disallowing or discouraging using explicitly named
>> arguments requires some work and is usually not a good idea. In the
>> case of the data.frame mechods for [ and [[ the decision was made to
>> discourage using named arguments other than 'exact'. This seems to
>> have been to allow a more an expedient way to implement these
>> functions. This could be revisited, but I doubt is is worth the effort.
>> 
>> For me the main reason for using pipes is to make code more
>> readable. Using `[` and such constructs is not furthering that
>> cause. When I use pipes I am almost always using tidyverse
>> features, so I have dpyr::pull available, which is more readable,
>> to me at least. Arguably, base R could have a similar function,
>> but again I doubt this would be a good investment of time.
>> 
>> An option that we have experimented with is to allow the placeholder
>> at the head of an extraction chain. This is supported in the
>> experimental branch at
>> https://svn.r-project.org/R/branches/R-syntax. So for example:
>>
>>      > mtcars |> _$cyl[1]
>>      [1] 6
>> 
>> This may make it into R-devel for the next release, but it still needs
>> more testing.
>> 
>> Best,
>> 
>> luke
>> 
>>> 
>>> Hope this helps,
>>> 
>>> Rui Barradas
>>> 
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


More information about the R-devel mailing list