[R-pkg-devel] tibbles are not data frames

Patrick Perry pperry at stern.nyu.edu
Tue Sep 26 19:15:04 CEST 2017


Pro ignoring x[,1,drop=TRUE]:
(1) it forces users to write consistent code for extracting a vector 
from a data frame

Con:
(1) functions that accept both matrices and data frames might break 
(x[[j]][i] doesn't work for a matrix)
(2) functions that use the access pattern x[i,j,drop = TRUE] will break

Most of the breakages for Con (2) can be fixed by changing to x[[j]][i], 
but not all of them:

 > x <- data.frame(V=1:26, row.names = letters)
 > x[c("a","e","i","o","u"), "V", drop = TRUE]
[1]  1  5  9 15 21
 > x[["V"]][c("a","e","i","o","u")]
[1] NA NA NA NA NA

To me, the Cons outweigh the Pro, but I understand that the tidyverse 
puts a heavy weight on "one way to do things".

Perhaps a bigger issue with tibbles is that they don't let you index 
with row names:

 > y <- tibble(x = letters)
 > rownames(y)
  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" 
"14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
 > y[rownames(y)[c(1,5,9,15,21)],]
# A tibble: 5 x 1
       x
<chr>
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>

If you want to write code that supports both tibbles and data frames, 
then you either have to avoid row names and drop = TRUE, or else you 
have to call `as.data.frame` on the input. This goes the other way, too. 
If you want to write a tidyverse function that also accepts data.frames, 
then you should call as_tibble on the input, otherwise your function 
will break when you index the input like x[,1].


Patrick
> Hadley Wickham <mailto:h.wickham at gmail.com>
> September 26, 2017 at 11:29 AM
> On Tue, Sep 26, 2017 at 9:22 AM, Patrick Perry<patperry at gmail.com>  wrote:
>> Would it be possible to change tibbles so that
>>
>> x[,1,drop=TRUE]
>>
>> returns a vector, not a data frame? I certainly find it surprising that
>> tibbles ignore
>> the drop argument. If tibbles respeced the drop argument, then package
>> developers could rely on
>>
>> x[,1,drop=FALSE]
>>
>> or
>>
>> x[,1,drop=TRUE]
>>
>> behaving consistently, regardless of whether the argument is a tibble or a
>> data.frame.
>
> They can currently rely on x[[1]] returning alway a vector and x[, 1,
> drop = FALSE] always returning a data frame whether x is a tibble or a
> data frame. I personally don't believe that an additional approach
> would help.

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list