[Rd] transform

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Tue Aug 27 11:55:16 CEST 2024


Yes. A quirk, rather than a bug I'd say. One issue is that the internal logic of transform() relies on

    e <- eval(substitute(list(...)), `_data`, parent.frame())
    tags <- names(e)

so untagged entries in ... will not be included. The other part is a direct consequence of a quirk in data.frame:

> data.frame(head(airquality), y=data.frame(x=rnorm(6)))
  Ozone Solar.R Wind Temp Month Day          x
1    41     190  7.4   67     5   1  0.3075402
2    36     118  8.0   72     5   2  0.7765265
3    12     149 12.6   74     5   3  0.3909341
4    18     313 11.5   62     5   4  0.4733170
5    NA      NA 14.3   56     5   5 -0.6947709
6    28      NA 14.9   66     5   6  0.1126040

whereas (the wisdom of this escapes me)

> data.frame(head(airquality), y=data.frame(x=rnorm(6),z=rnorm(6)))
  Ozone Solar.R Wind Temp Month Day        y.x         y.z
1    41     190  7.4   67     5   1 -0.9250228  0.46483406
2    36     118  8.0   72     5   2 -0.5035793  0.28822668
...

On the whole, I think that transform was never designed (nor documented) to take data frame arguments, so caveat emptor.

- Peter


> On 24 Aug 2024, at 16:41 , Gabor Grothendieck <ggrothendieck using gmail.com> wrote:
> 
> One oddity in transform that I recently noticed.  It seems that to include
> a one-column data frame in the arguments one must name it even though the
> name is ignored.  If the data frame has more than one column then it must
> also be named but in that case it is not ignored and the names are made up of
> a combination of that name and the data frame's names.  I would have thought
> that if we did not want a combination of names we would just not name the
> argument.
> 
>  # ignores second argument returning BOD unchanged
>  transform(BOD, data.frame(y = 1:6)) |> names()
>  ## [1] "Time"   "demand"
> 
>  # ignores second argument returning BOD unchanged
>  transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
>  ## [1] "Time"   "demand"
> 
>  # with one column in data frame it adds the column and names it y ignoring x
>  transform(BOD, x = data.frame(y = 1:6)) |> names()
>  ## [1] "Time"   "demand" "y"
> 
>  # with multiple columns in data frame it uses x.y and x.z as names
>  transform(BOD, data.frame(y = 1:6, z = 6:1)) |> names()
>  ## [1] "Time"   "demand" "x.y"    "x.z"
> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-devel mailing list