[Rd] transform.data.frame() ignores unnamed arguments when no named argument is provided

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Thu Mar 2 23:37:18 CET 2023


On Thu, Mar 2, 2023 at 2:02 PM Antoine Fabri <antoine.fabri using gmail.com>
wrote:

> Thanks and good point about unspecified behavior. The way it behaves now
> (when it doesn't ignore) is more consistent with data.frame() though so I
> prefer that to a "warn and ignore" behaviour:
>
> data.frame(a = 1, b = 2, 3)
>
> #>   a b X3
>
> #> 1 1 2  3
>
>
> data.frame(a = 1, 2, 3)
>
> #>   a X2 X3
>
> #> 1 1  2  3
>
>
> (and in general warnings make for unpleasant debugging so I prefer when we
> don't add new ones if avoidable)
>

I find silence to be much more unpleasant in practice when debugging,
myself, but that may be a personal preference.


>
>
> playing a bit more with it, it would make sense to me that the following
> have the same output:
>
>
> coefficient <- 3
>
>
> data.frame(value1 = 5) |> transform(coefficient, value2 = coefficient *
> value1)
>
> #>   value1 X3 value2
>
> #> 1      5  3     15
>
>
> data.frame(value1 = 5, coefficient) |> transform(value2 = coefficient *
> value1)
>
> #>   value1 coefficient value2
>
> #> 1      5           3     15
>
>
I'm not so sure. data.frame() is doing some substitute magic to get the
column name coefficient there.

> coefficient = 3

> data.frame(value1 = 5, coefficient)

  value1 coefficient

1      5           3

Beyond that these two pieces of code are doing subtly but crucially
different things; in the latter, coefficient is a variable in the
data.frame, and when transform resolves that symbol during calculation of
value2, it *gets the column in the incoming data.frame*.

In the former case, coefficient does not exist in the data.frame, so the
symbol is being resolved somewhere else in the scope chain (in this case,
the global environment).

These happen to be the same, except for the column name , but we can see
the difference if we change the code to

> coefficient <- 3

> data.frame(value1 = 5, coefficient = 4)  |> transform(value2 = value1 *
coefficient)

  value1 coefficient value2

1      5           4     20

> data.frame(value1 = 5) |> transform(coefficient = 4, value2 = value1 *
coefficient)

  value1 coefficient *value2*

1      5           4     *15*

Please note that another way this difference could rear its head is if
these arent' directly one after eachother in a pipe:

> coefficient <- 3

> df1 <- data.frame(value1 = 5, coefficient)

> coefficient <- 4

> df2 <- data.frame(value1 = 5)

> df1 |> transform(value2 = value1 * coefficient)

  value1 coefficient value2

1      5           3     15

> df2 |> transform(coefficient, value2 = value1 * coefficient)

  value1 X4 value2

1      5  4     20


Cause you know someday the place where you do that transform and the place
where coefficient is initially set are gonna be far away from eachother, so
whether you put coefficient into the incoming data, or don't will matter.


Best,
~G

        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list