[R-pkg-devel] tibbles are not data frames

Gábor Csárdi csardi.gabor at gmail.com
Tue Sep 26 14:17:49 CEST 2017


Yes, basically tibbles violate the substitution principle. A lot of
other packages do, probably base R as well, although it is sometimes
hard to say, because there is no clear object hierarchy.

Let's take a step back, and see how you can check for a data frame argument.

1. Weak check.

is.data.frame(arg)

This essentially means that you trust subclasses of data.frame to
adhere to the substitution principle. While this is nice in theory, a
lot packages (including both major packages implementing subclasses of
data.frame!) do not always adhere. So this is not really a safe
solution.

Base R does this as well, sometimes, e.g. aggregate.data.frame has:

    if (!is.data.frame(x))
        x <- as.data.frame(x)

which is essentially equivalent to the weak check, since it leaves
data.frame subclasses untouched.

2. Strong "check".

arg <- as.data.frame(arg)

This is safer, because it does not rely on subclass implementors. It
also has the additional benefit that your code is polymorphic: it
works with any input, as long as it can be converted to a data frame.

Base R also uses this often, e.g. in merge.data.frame:

    nx <- nrow(x <- as.data.frame(x))
    ny <- nrow(y <- as.data.frame(y))

Gabor

Disclaimer: I do not represent the tibble authors in any way.

On Tue, Sep 26, 2017 at 11:21 AM, David Hugh-Jones
<davidhughjones at gmail.com> wrote:
> These replies seem to be missing the point, which is that old code has to be
> rewritten because tibbles don't behave like data frames.
>
> It is true that subclasses can override behaviour, but there is an implicit
> contract that the same methods should do the same things.
>
> The as.xxx pattern seems weird to me, though I see it a lot. What is the
> point of inheritance if you always have to convert an object upwards before
> you can treat it as a member of the superclass?
>
> I can see this argument will run...
>
> David
>
> On 26 September 2017 at 11:15, Gábor Csárdi <csardi.gabor at gmail.com> wrote:
>>
>> What is the benefit here, compared to just calling as.data.frame() on it?
>>
>> Gabor
>>
>> On Tue, Sep 26, 2017 at 11:11 AM, Daniel Lüdecke <d.luedecke at uke.de>
>> wrote:
>> > Since tibbles add their class attributes first, you could use:
>> >
>> > tb <- tibble(a = 5)
>> > inherits(tb, "data.frame", which = TRUE) == 1
>> >
>> > if "tb" is a data frame (only), TRUE is returned, for tibble FALSE. You
>> > could then coerce to data frame: as.data.frame(tb)
>> >
>> > -----Ursprüngliche Nachricht-----
>> > Von: R-package-devel [mailto:r-package-devel-bounces at r-project.org] Im
>> > Auftrag von Göran Broström
>> > Gesendet: Dienstag, 26. September 2017 12:09
>> > An: r-package-devel at r-project.org
>> > Betreff: Re: [R-pkg-devel] tibbles are not data frames
>> >
>> >
>> >
>> > On 2017-09-26 11:56, Gábor Csárdi wrote:
>> >> On Tue, Sep 26, 2017 at 10:35 AM, Joris Meys <Joris.Meys at ugent.be>
>> >> wrote:
>> >>> I don't like the dropping of dimensions either. That doesn't change
>> >>> the fact that a tibble reacts different from a data.frame. So tibbles
>> >>> do not inherit correctly from the class data.frame, and it can thus
>> >>> be argued that it's against OOP paradigms to pretend tibbles inherit
>> >>> from the class data.frame.
>> >>
>> >> I have yet to see an OOP system in which a subclass cannot override
>> >> the methods of its superclass. Not only is this in line with OOP
>> >> paradigms, it is actually one of the essential OOP features.
>> >>
>> >> To be more constructive, if you have a function that only works with
>> >> data frame inputs, then it is good practice to check that the supplied
>> >> input is indeed a data frame. This is independent of tibbles.
>> >
>> > It is not. I check input for being a data frame, but tibbles pass that
>> > test. That's the essence of the problem.
>> >
>> >> In practice it seems to me that an easy fix is to just call
>> >> as.data.frame on the input. This should either convert it to a data
>> >> frame, or throw an error.
>> >
>> > Sure, but I still need to rewrite the package.
>> >
>> > Görn
>> >
>> >> For tibbles it
>> >> drops the tbl* classes.
>> >>
>> >> Gabor
>> >>
>> >>> Defensive coding techniques would check if it's a tibble and return
>> >>> an error saying a data.frame is expected. Unless tibbles inherit
>> >>> correctly from data.frame.
>> >>>
>> >>> I have nothing against tibbles. But calling them "data.frame" raises
>> >>> expectations that can't be fulfilled.
>> >>
>> >> [...]
>> >>
>> >> ______________________________________________
>> >> R-package-devel at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> >>
>> >
>> > ______________________________________________
>> > R-package-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>> >
>> > --
>> >
>> > _____________________________________________________________________
>> >
>> > Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
>> > Rechts; Gerichtsstand: Hamburg | www.uke.de
>> > Vorstandsmitglieder: Prof. Dr. Burkhard Göke (Vorsitzender), Prof. Dr.
>> > Dr. Uwe Koch-Gromus, Joachim Prölß, Martina Saurin (komm.)
>> > _____________________________________________________________________
>> >
>> > SAVE PAPER - THINK BEFORE PRINTING
>> > ______________________________________________
>> > R-package-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>>
>> ______________________________________________
>> R-package-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
>



More information about the R-package-devel mailing list