[Rd] transform
peter dalgaard
pd@|gd @end|ng |rom gm@||@com
Tue Aug 27 11:55:16 CEST 2024
Yes. A quirk, rather than a bug I'd say. One issue is that the internal logic of transform() relies on
e <- eval(substitute(list(...)), `_data`, parent.frame())
tags <- names(e)
so untagged entries in ... will not be included. The other part is a direct consequence of a quirk in data.frame:
> data.frame(head(airquality), y=data.frame(x=rnorm(6)))
Ozone Solar.R Wind Temp Month Day x
1 41 190 7.4 67 5 1 0.3075402
2 36 118 8.0 72 5 2 0.7765265
3 12 149 12.6 74 5 3 0.3909341
4 18 313 11.5 62 5 4 0.4733170
5 NA NA 14.3 56 5 5 -0.6947709
6 28 NA 14.9 66 5 6 0.1126040
whereas (the wisdom of this escapes me)
> data.frame(head(airquality), y=data.frame(x=rnorm(6),z=rnorm(6)))
Ozone Solar.R Wind Temp Month Day y.x y.z
1 41 190 7.4 67 5 1 -0.9250228 0.46483406
2 36 118 8.0 72 5 2 -0.5035793 0.28822668
...
On the whole, I think that transform was never designed (nor documented) to take data frame arguments, so caveat emptor.
- Peter
> On 24 Aug 2024, at 16:41 , Gabor Grothendieck <ggrothendieck using gmail.com> wrote:
>
> One oddity in transform that I recently noticed. It seems that to include
> a one-column data frame in the arguments one must name it even though the
> name is ignored. If the data frame has more than one column then it must
> also be named but in that case it is not ignored and the names are made up of
> a combination of that name and the data frame's names. I would have thought
> that if we did not want a combination of names we would just not name the
> argument.
>
> # ignores second argument returning BOD unchanged
> transform(BOD, data.frame(y = 1:6)) |> names()
> ## [1] "Time" "demand"
>
> # ignores second argument returning BOD unchanged
> transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
> ## [1] "Time" "demand"
>
> # with one column in data frame it adds the column and names it y ignoring x
> transform(BOD, x = data.frame(y = 1:6)) |> names()
> ## [1] "Time" "demand" "y"
>
> # with multiple columns in data frame it uses x.y and x.z as names
> transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
> ## [1] "Time" "demand" "x.y" "x.z"
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-devel
mailing list