[R-pkg-devel] Summary: tibbles are not data frames

William Dunlap wdunlap at tibco.com
Wed Oct 4 19:52:08 CEST 2017


> However, I learned that I should change things like
> 'dat[5:8, 1]' to 'dat[[1]][5:8]', respecting the fact that a data frame
is a list

In the old days we encouraged the dat[i,j] usage because it worked
on both matrices and data.frames.  Even now, the transformation to
dat[[j]][i] only works when length(j)==1.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Fri, Sep 29, 2017 at 2:50 AM, Göran Broström <goran.brostrom at umu.se>
wrote:

> Dear all,
>
> I got an overwhelming amount of response to my question, and making a
> complete summary is not possible. However, I learned that I should change
> things like 'dat[5:8, 1]' to 'dat[[1]][5:8]',
> respecting the fact that a data frame is a list, in my packages.
> Accidentally, this also solves the problem  of using tibbles where data
> frames are expected. The classes "tbl" and "tbl_df" will be preserved on
> output; no need to use 'as.data.frame', the simple check 'is.data.frame' is
> enough.
>
> Thanks for a very interesting and enlightening discussion! A special
> thanks to Hadley for sharing great packages with us; I could only wish they
> were easier to use in my own packages;)
>
> Göran
>
> On 2017-09-26 15:37, Hadley Wickham wrote:
>
>> On Tue, Sep 26, 2017 at 2:30 AM, Göran Broström
>> <goran.brostrom at umu.se> wrote:
>>
>>> I am beginning to get complaints from users of my CRAN packages
>>> (especially 'eha') to the effect that they get error messages like
>>> "Error: Unsupported use of matrix or array for column indexing".
>>>
>>> It turns out that they are sticking in tibbles into functions that
>>> expect data frames as input. And I am using the kind of subsetting
>>> that Hadley dislikes (eha is an old package, much older than
>>> tibbles). It is of course a simple matter to change the code so it
>>> handles both data frames and tibbles correctly, but this affects
>>> many functions, and it will take some time. And when the next guy
>>> introduces 'troubles' as an improvement of 'tibbles', I will have
>>> to rewrite the code again.
>>>
>>
>> Changing df[, x] to df[[x]] is not very hard and makes your code easier
>> to understand because it more clearly conveys the intent that you want a
>> single column.
>>
>> While I like Hadley's way of doing it, I think it is a mistake to
>>> let a tibble also be of class data frame. To me it is a matter of
>>> inheritance and backwards compability: A tibble should add nice
>>> things to a data frame, not change basic behaviour, in order to
>>> call itself a data frame.
>>>
>>> Is it correct to let a tibble be of class "data.frame"?
>>>
>>
>> If it not inherit from data frame, it would be not work with the 99% of
>> functions that work with data frames and don't deliberately take advantage
>> of the dropping behaviour of [. In other words, it would
>> be pointless.
>>
>> I decided to make [.tibble type-stable (i.e. always return a data frame)
>> because this behaviour causes substantial problems in real
>> data analysis code. I did it understanding that it would cause some
>> package developers frustration, but I think it's better for a handful
>> of package maintainers to be frustrated than hundreds of users
>> creating dangerous code.
>>
>> Hadley
>>
>>
> ______________________________________________
> R-package-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list