[R] Bug in print for data frames?
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Thu Oct 26 10:55:26 CEST 2023
On 25/10/2023 2:18 a.m., Christian Asseburg wrote:
> Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think?
>
> Using R 4.3.1:
>
>> x <- data.frame(A = 1, B = 2, C = 3)
>> y <- data.frame(A = 1)
>> x
> A B C
> 1 1 2 3
>> x$B <- y$A # works as expected
>> x
> A B C
> 1 1 1 3
>> x$C <- y[1] # makes C disappear
>> x
> A B A
> 1 1 1 1
>> str(x)
> 'data.frame': 1 obs. of 3 variables:
> $ A: num 1
> $ B: num 1
> $ C:'data.frame': 1 obs. of 1 variable:
> ..$ A: num 1
>
> Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer.
y[1] is a dataframe with one column, i.e. it is identical to y. To get
the result you expected, you should have used y[[1]], to extract column 1.
Since dataframes are lists, you can assign them as columns of other
dataframes, and you'll create a single column in the result whose rows
are the columns of the dataframe you're assigning. This means that
x$C <- y[1]
replaces the C column of x with a dataframe. It retains the name C (you
can see this if you print names(x) ), but since the column contains a
dataframe, it chooses to use the column name of y when printing.
If you try
x$D <- x
you'll see it generate new names when printing, but the names within x
remain as A, B, C, D.
This is a situation where tibbles do a better job than dataframes: if
you created x and y as tibbles instead of dataframes and executed your
code, you'd see this:
library(tibble)
x <- tibble(A = 1, B = 2, C = 3)
y <- tibble(A = 1)
x$C <- y[1]
x
#> # A tibble: 1 × 3
#> A B C$A
#> <dbl> <dbl> <dbl>
#> 1 1 2 1
Duncan Murdoch
More information about the R-help
mailing list