[R-pkg-devel] Summary: tibbles are not data frames

Göran Broström goran.brostrom at umu.se
Fri Sep 29 11:50:23 CEST 2017


Dear all,

I got an overwhelming amount of response to my question, and making a
complete summary is not possible. However, I learned that I should 
change things like 'dat[5:8, 1]' to 'dat[[1]][5:8]',
respecting the fact that a data frame is a list, in my packages.
Accidentally, this also solves the problem  of using tibbles where data 
frames are expected. The classes "tbl" and "tbl_df" will be preserved on 
output; no need to use 'as.data.frame', the simple check 'is.data.frame' 
is enough.

Thanks for a very interesting and enlightening discussion! A special 
thanks to Hadley for sharing great packages with us; I could only wish 
they were easier to use in my own packages;)

Göran

On 2017-09-26 15:37, Hadley Wickham wrote:
> On Tue, Sep 26, 2017 at 2:30 AM, Göran Broström
> <goran.brostrom at umu.se> wrote:
>> I am beginning to get complaints from users of my CRAN packages
>> (especially 'eha') to the effect that they get error messages like
>> "Error: Unsupported use of matrix or array for column indexing".
>> 
>> It turns out that they are sticking in tibbles into functions that
>> expect data frames as input. And I am using the kind of subsetting
>> that Hadley dislikes (eha is an old package, much older than
>> tibbles). It is of course a simple matter to change the code so it
>> handles both data frames and tibbles correctly, but this affects
>> many functions, and it will take some time. And when the next guy
>> introduces 'troubles' as an improvement of 'tibbles', I will have
>> to rewrite the code again.
> 
> Changing df[, x] to df[[x]] is not very hard and makes your code 
> easier to understand because it more clearly conveys the intent that 
> you want a single column.
> 
>> While I like Hadley's way of doing it, I think it is a mistake to
>> let a tibble also be of class data frame. To me it is a matter of
>> inheritance and backwards compability: A tibble should add nice
>> things to a data frame, not change basic behaviour, in order to
>> call itself a data frame.
>> 
>> Is it correct to let a tibble be of class "data.frame"?
> 
> If it not inherit from data frame, it would be not work with the 99% 
> of functions that work with data frames and don't deliberately take 
> advantage of the dropping behaviour of [. In other words, it would
> be pointless.
> 
> I decided to make [.tibble type-stable (i.e. always return a data 
> frame) because this behaviour causes substantial problems in real
> data analysis code. I did it understanding that it would cause some
> package developers frustration, but I think it's better for a handful
> of package maintainers to be frustrated than hundreds of users
> creating dangerous code.
> 
> Hadley
>



More information about the R-package-devel mailing list