[R-pkg-devel] tibbles are not data frames
Patrick Perry
pperry at stern.nyu.edu
Tue Sep 26 19:15:04 CEST 2017
Pro ignoring x[,1,drop=TRUE]:
(1) it forces users to write consistent code for extracting a vector
from a data frame
Con:
(1) functions that accept both matrices and data frames might break
(x[[j]][i] doesn't work for a matrix)
(2) functions that use the access pattern x[i,j,drop = TRUE] will break
Most of the breakages for Con (2) can be fixed by changing to x[[j]][i],
but not all of them:
> x <- data.frame(V=1:26, row.names = letters)
> x[c("a","e","i","o","u"), "V", drop = TRUE]
[1] 1 5 9 15 21
> x[["V"]][c("a","e","i","o","u")]
[1] NA NA NA NA NA
To me, the Cons outweigh the Pro, but I understand that the tidyverse
puts a heavy weight on "one way to do things".
Perhaps a bigger issue with tibbles is that they don't let you index
with row names:
> y <- tibble(x = letters)
> rownames(y)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
"14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
> y[rownames(y)[c(1,5,9,15,21)],]
# A tibble: 5 x 1
x
<chr>
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>
If you want to write code that supports both tibbles and data frames,
then you either have to avoid row names and drop = TRUE, or else you
have to call `as.data.frame` on the input. This goes the other way, too.
If you want to write a tidyverse function that also accepts data.frames,
then you should call as_tibble on the input, otherwise your function
will break when you index the input like x[,1].
Patrick
> Hadley Wickham <mailto:h.wickham at gmail.com>
> September 26, 2017 at 11:29 AM
> On Tue, Sep 26, 2017 at 9:22 AM, Patrick Perry<patperry at gmail.com> wrote:
>> Would it be possible to change tibbles so that
>>
>> x[,1,drop=TRUE]
>>
>> returns a vector, not a data frame? I certainly find it surprising that
>> tibbles ignore
>> the drop argument. If tibbles respeced the drop argument, then package
>> developers could rely on
>>
>> x[,1,drop=FALSE]
>>
>> or
>>
>> x[,1,drop=TRUE]
>>
>> behaving consistently, regardless of whether the argument is a tibble or a
>> data.frame.
>
> They can currently rely on x[[1]] returning alway a vector and x[, 1,
> drop = FALSE] always returning a data frame whether x is a tibble or a
> data frame. I personally don't believe that an additional approach
> would help.
[[alternative HTML version deleted]]
More information about the R-package-devel
mailing list