[R] Bug in print for data frames?

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Fri Nov 3 13:12:47 CET 2023


It's still kind of weird; embedded 2-column data frames print differently than 1-column ones:

> d <- data.frame(a=1, b=I(data.frame(d=1,e=2)))
> d
  a b.d b.e
1 1   1   2
> str(d)
'data.frame':	1 obs. of  2 variables:
 $ a: num 1
 $ b:Classes 'AsIs' and 'data.frame':	1 obs. of  2 variables:
  ..$ d: num 1
  ..$ e: num 2
> names(d)
[1] "a" "b"
> d <- data.frame(a=1, b=I(data.frame(d=1)))
> d
  a d
1 1 1
> str(d)
'data.frame':	1 obs. of  2 variables:
 $ a: num 1
 $ b:Classes 'AsIs' and 'data.frame':	1 obs. of  1 variable:
  ..$ d: num 1
> names(d)
[1] "a" "b"

It is happening inside format.data.frame() or as.data.frame.list() but I can't figure out the logic at this point.

-pd


> On 26 Oct 2023, at 10:55 , Duncan Murdoch <murdoch.duncan using gmail.com> wrote:
> 
> On 25/10/2023 2:18 a.m., Christian Asseburg wrote:
>> Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think?
>> Using R 4.3.1:
>>> x <- data.frame(A = 1, B = 2, C = 3)
>>> y <- data.frame(A = 1)
>>> x
>>   A B C
>> 1 1 2 3
>>> x$B <- y$A # works as expected
>>> x
>>   A B C
>> 1 1 1 3
>>> x$C <- y[1] # makes C disappear
>>> x
>>   A B A
>> 1 1 1 1
>>> str(x)
>> 'data.frame':   1 obs. of  3 variables:
>>  $ A: num 1
>>  $ B: num 1
>>  $ C:'data.frame':      1 obs. of  1 variable:
>>   ..$ A: num 1
>> Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer.
> 
> y[1] is a dataframe with one column, i.e. it is identical to y.  To get the result you expected, you should have used y[[1]], to extract column 1.
> 
> Since dataframes are lists, you can assign them as columns of other dataframes, and you'll create a single column in the result whose rows are the columns of the dataframe you're assigning.  This means that
> 
> x$C <- y[1]
> 
> replaces the C column of x with a dataframe.  It retains the name C (you can see this if you print names(x) ), but since the column contains a dataframe, it chooses to use the column name of y when printing.
> 
> If you try
> 
> x$D <- x
> 
> you'll see it generate new names when printing, but the names within x remain as A, B, C, D.
> 
> This is a situation where tibbles do a better job than dataframes:  if you created x and y as tibbles instead of dataframes and executed your code, you'd see this:
> 
>  library(tibble)
>  x <- tibble(A = 1, B = 2, C = 3)
>  y <- tibble(A = 1)
>  x$C <- y[1]
>  x
>  #> # A tibble: 1 × 3
>  #>       A     B   C$A
>  #>   <dbl> <dbl> <dbl>
>  #> 1     1     2     1
> 
> Duncan Murdoch
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-help mailing list