[R] Bug in print for data frames?
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Thu Oct 26 12:42:49 CEST 2023
Às 07:18 de 25/10/2023, Christian Asseburg escreveu:
> Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think?
>
> Using R 4.3.1:
>
>> x <- data.frame(A = 1, B = 2, C = 3)
>> y <- data.frame(A = 1)
>> x
> A B C
> 1 1 2 3
>> x$B <- y$A # works as expected
>> x
> A B C
> 1 1 1 3
>> x$C <- y[1] # makes C disappear
>> x
> A B A
> 1 1 1 1
>> str(x)
> 'data.frame': 1 obs. of 3 variables:
> $ A: num 1
> $ B: num 1
> $ C:'data.frame': 1 obs. of 1 variable:
> ..$ A: num 1
>
> Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer.
>
> Thanks. With best wishes -
>
> . . . Christian
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
To expand on the good answers already given, I will present two other
example data sets.
Example 1. Imagine that instead of assigning just one column from y to
x$C you assign two columns. The result is a data.frame column. See what
is displayed as the columns names.
And unlike what happens with `[`, when asssigning columns 1:2, the
operator `[[` doesn't work. You will have to extract the columns y$A and
y$B one by one.
x <- data.frame(A = 1, B = 2, C = 3)
y <- data.frame(A = 1, B = 4)
str(y)
#> 'data.frame': 1 obs. of 2 variables:
#> $ A: num 1
#> $ B: num 4
x$C <- y[1:2]
x
#> A B C.A C.B
#> 1 1 2 1 4
str(x)
#> 'data.frame': 1 obs. of 3 variables:
#> $ A: num 1
#> $ B: num 2
#> $ C:'data.frame': 1 obs. of 2 variables:
#> ..$ A: num 1
#> ..$ B: num 4
x[[1:2]] # doesn't work
#> Error in .subset2(x, i, exact = exact): subscript out of bounds
Example 2. Sometimes it is usefull to get a result like this first and
then correct the resulting df. For instance, when computing more than
one summary statistics.
str(agg) below shows that the result summary stats is a matrix, so you
have a column-matrix. And once again the displayed names reflect that.
The trick to make the result a df is to extract all but the last column
as a sub-df, extract the last column's values as a matrix (which it is)
and then cbind the two together.
cbind is a generic function. Since the first argument to cbind is a
sub-df, the method called is cbind.data.frame and the result is a df.
df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)
# the anonymous function computes more than one summary statistics
# note that it returns a named vector
agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x)))
agg
#> A X.Mean X.S
#> 1 a 14.500000 9.082951
#> 2 b 15.500000 9.082951
#> 3 c 16.500000 9.082951
# similar effect as in the OP, The difference is that the last
# column is a matrix, not a data.frame
str(agg)
#> 'data.frame': 3 obs. of 2 variables:
#> $ A: chr "a" "b" "c"
#> $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:2] "Mean" "S"
# nc is just a convenience, avoids repeated calls to ncol
nc <- ncol(agg)
cbind(agg[-nc], agg[[nc]])
#> A Mean S
#> 1 a 14.5 9.082951
#> 2 b 15.5 9.082951
#> 3 c 16.5 9.082951
# all is well
cbind(agg[-nc], agg[[nc]]) |> str()
#> 'data.frame': 3 obs. of 3 variables:
#> $ A : chr "a" "b" "c"
#> $ Mean: num 14.5 15.5 16.5
#> $ S : num 9.08 9.08 9.08
If the anonymous function hadn't returned a named vetor, the new column
names would have been "1". "2", try it.
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list