[R] Bug in print for data frames?
peter dalgaard
pd@|gd @end|ng |rom gm@||@com
Fri Nov 3 13:12:47 CET 2023
It's still kind of weird; embedded 2-column data frames print differently than 1-column ones:
> d <- data.frame(a=1, b=I(data.frame(d=1,e=2)))
> d
a b.d b.e
1 1 1 2
> str(d)
'data.frame': 1 obs. of 2 variables:
$ a: num 1
$ b:Classes 'AsIs' and 'data.frame': 1 obs. of 2 variables:
..$ d: num 1
..$ e: num 2
> names(d)
[1] "a" "b"
> d <- data.frame(a=1, b=I(data.frame(d=1)))
> d
a d
1 1 1
> str(d)
'data.frame': 1 obs. of 2 variables:
$ a: num 1
$ b:Classes 'AsIs' and 'data.frame': 1 obs. of 1 variable:
..$ d: num 1
> names(d)
[1] "a" "b"
It is happening inside format.data.frame() or as.data.frame.list() but I can't figure out the logic at this point.
-pd
> On 26 Oct 2023, at 10:55 , Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
>
> On 25/10/2023 2:18 a.m., Christian Asseburg wrote:
>> Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think?
>> Using R 4.3.1:
>>> x <- data.frame(A = 1, B = 2, C = 3)
>>> y <- data.frame(A = 1)
>>> x
>> A B C
>> 1 1 2 3
>>> x$B <- y$A # works as expected
>>> x
>> A B C
>> 1 1 1 3
>>> x$C <- y[1] # makes C disappear
>>> x
>> A B A
>> 1 1 1 1
>>> str(x)
>> 'data.frame': 1 obs. of 3 variables:
>> $ A: num 1
>> $ B: num 1
>> $ C:'data.frame': 1 obs. of 1 variable:
>> ..$ A: num 1
>> Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer.
>
> y[1] is a dataframe with one column, i.e. it is identical to y. To get the result you expected, you should have used y[[1]], to extract column 1.
>
> Since dataframes are lists, you can assign them as columns of other dataframes, and you'll create a single column in the result whose rows are the columns of the dataframe you're assigning. This means that
>
> x$C <- y[1]
>
> replaces the C column of x with a dataframe. It retains the name C (you can see this if you print names(x) ), but since the column contains a dataframe, it chooses to use the column name of y when printing.
>
> If you try
>
> x$D <- x
>
> you'll see it generate new names when printing, but the names within x remain as A, B, C, D.
>
> This is a situation where tibbles do a better job than dataframes: if you created x and y as tibbles instead of dataframes and executed your code, you'd see this:
>
> library(tibble)
> x <- tibble(A = 1, B = 2, C = 3)
> y <- tibble(A = 1)
> x$C <- y[1]
> x
> #> # A tibble: 1 × 3
> #> A B C$A
> #> <dbl> <dbl> <dbl>
> #> 1 1 2 1
>
> Duncan Murdoch
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk Priv: PDalgd using gmail.com
More information about the R-help
mailing list