[Rd] data.frame weirdness

Gabor Grothendieck ggrothend|eck @end|ng |rom gm@||@com
Tue Nov 14 16:00:24 CET 2023


Seems like a leaky abstraction.  If both representations are supposed
to be outwardly the same to the user then they should act the same and
if not then identical should not be TRUE.

On Tue, Nov 14, 2023 at 9:56 AM Deepayan Sarkar
<deepayan.sarkar using gmail.com> wrote:
>
> On Tue, 14 Nov 2023 at 09:41, Gabor Grothendieck
> <ggrothendieck using gmail.com> wrote:
> >
> > Also why should that difference result in different behavior?
>
> That's justifiable, I think; consider:
>
> > d1 = data.frame(a = 1:4)
> > d2 = d3 = data.frame(b = 1:2)
> > row.names(d3) = c("a", "b")
> > data.frame(d1, d2)
>   a b
> 1 1 1
> 2 2 2
> 3 3 1
> 4 4 2
> > data.frame(d1, d2)
>   a b
> 1 1 1
> 2 2 2
> 3 3 1
> 4 4 2
> > data.frame(d1, d3)
>   a b
> 1 1 1
> 2 2 2
> 3 3 1
> 4 4 2
> Warning message:
> In data.frame(d1, d3) :
>   row names were found from a short variable and have been discarded
> > data.frame(d2, d3)
>   b b.1
> a 1   1
> b 2   2
>
>
> > On Tue, Nov 14, 2023 at 9:38 AM Gabor Grothendieck
> > <ggrothendieck using gmail.com> wrote:
> > >
> > > In that case identical should be FALSE but  it is TRUE
>
> Yes, or at least both cases should warn (or not warn). Certainly not
> ideal, but one of the inevitable side effects of having two different
> ways of storing row names that R tries to pretend should be
> exchangeable, but are not (and some code not having caught up).
>
> Part of the problem, I think, is that it's not clear what the ideal
> behaviour should be in such cases (to warn or not to warn).
>
> Best,
> -Deepayan
>
> > > identical(a1, a2)
> > > ## [1] TRUE
> > >
> > >
> > > On Tue, Nov 14, 2023 at 8:58 AM Deepayan Sarkar
> > > <deepayan.sarkar using gmail.com> wrote:
> > > >
> > > > They differ in whether the row names are "automatic":
> > > >
> > > > > .row_names_info(a1)
> > > > [1] -3
> > > > > .row_names_info(a2)
> > > > [1] 3
> > > >
> > > > Best,
> > > > -Deepayan
> > > >
> > > > On Tue, 14 Nov 2023 at 08:23, Gabor Grothendieck
> > > > <ggrothendieck using gmail.com> wrote:
> > > > >
> > > > > What is going on here?  In the lines ending in #### the inputs and outputs
> > > > > are identical yet one gives a warning and the other does  not.
> > > > >
> > > > > a1 <- `rownames<-`(anscombe[1:3, ],  NULL)
> > > > > a2 <- anscombe[1:3, ]
> > > > >
> > > > > ix <- 5:8
> > > > >
> > > > > # input arguments to #### are identical in both cases
> > > > >
> > > > > identical(stack(a1[ix]), stack(a2[ix]))
> > > > > ## [1] TRUE
> > > > > identical(a1[-ix], a2[-ix])
> > > > > ## [1] TRUE
> > > > >
> > > > >
> > > > > res1 <- data.frame(stack(a1[ix]), a1[-ix]) ####
> > > > > res2 <- data.frame(stack(a2[ix]), a2[-ix]) ####
> > > > > ## Warning message:
> > > > > ## In data.frame(stack(a2[ix]), a2[-ix]) :
> > > > > ##   row names were found from a short variable and have been discarded
> > > > >
> > > > > # results are identical
> > > > > identical(res1, res2)
> > > > > ## [1] TRUE
> > > > >
> > > > >
> > > > > --
> > > > > Statistics & Software Consulting
> > > > > GKX Group, GKX Associates Inc.
> > > > > tel: 1-877-GKX-GROUP
> > > > > email: ggrothendieck at gmail.com
> > > > >
> > > > > ______________________________________________
> > > > > R-devel using r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > >
> > >
> > > --
> > > Statistics & Software Consulting
> > > GKX Group, GKX Associates Inc.
> > > tel: 1-877-GKX-GROUP
> > > email: ggrothendieck at gmail.com
> >
> >
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-devel mailing list