[R-pkg-devel] tibbles are not data frames

Göran Broström goran.brostrom at umu.se
Tue Sep 26 12:02:37 CEST 2017


On 2017-09-26 11:35, Joris Meys wrote:
> I don't like the dropping of dimensions either. That doesn't change the 
> fact that a tibble reacts different from a data.frame. So tibbles do not 
> inherit correctly from the class data.frame, and it can thus be argued 
> that it's against OOP paradigms to pretend tibbles inherit from the 
> class data.frame. Defensive coding techniques would check if it's a 
> tibble and return an error saying a data.frame is expected. Unless 
> tibbles inherit correctly from data.frame.

The correct and logical way (which I use in 'eha') is to check if input 
is a data frame, and if not, throw an error. Checking for other things 
would soon be too overwhelming.

> 
> I have nothing against tibbles. But calling them "data.frame" raises 
> expectations that can't be fulfilled.

Exactly what I think. I wouldn't object to changing base data frames to 
behave like tibbles (with a few exceptions).

Göran

> 
> On Tue, Sep 26, 2017 at 11:23 AM, Stefan McKinnon Høj-Edwards 
> <sme at iysik.com <mailto:sme at iysik.com>> wrote:
> 
>     Thanks for the examples. Personally, I have been struck out multiple
>     times by data frames dropping dimensions, so I have a distaste for
>     this dropping behaviour.
> 
>     Personally, I prefer data frame *not* to drop dimensions. They are
>     not arrays, where slicing drops a dimension makes sense because all
>     entries are same data type.
>     You can pull out a column in vector form from both tribbles and data
>     frame with the $ index; subsetting a row from a data frame and
>     forcing it into an atomic vector will require cast all columns to
>     lowest common denominator, often character.
> 
>     So I would argue that yes, tribbles are data.frame with extra bells
>     and whistles, even if I do not understand the use of list columns.
> 
>     I suggest a defensive coding technique; if you need a data frame
>     subset to really be a vector, cast it as a vector. Users *will*
>     attempt to throw unexpected structures at your methods. When your
>     methods fails in mysterious ways because it didn't extract a vector,
>     users will be stupefied. Fail at `as.vector` will indicate why.
> 
>     Kindly,
>     Stefan
> 
>     Stefan McKinnon Høj-Edwards
>     ph.d. Genetics
>     +44 (0)776 231 2464 <tel:+44%207762%20312464>
>     +45 2888 6598 <tel:+45%2028%2088%2065%2098>
>     Skype: stefan_edwards
> 
>     2017-09-26 10:05 GMT+01:00 Joris Meys <Joris.Meys at ugent.be
>     <mailto:Joris.Meys at ugent.be>>:
> 
>         Here's one difference:
> 
>         atib <- tibble(a = 1:5, b = letters[5:1])
>         atib[3,"a"]
>         as.data.frame(atib)[3,"a"]
> 
>         The second line returns a tibble (no dropping dimensions), the
>         third line does (dropping dimensions). Huge difference if you
>         use [ , aColumn] to select a vector from a data frame.
> 
>         Cheers
>         Joris
> 
>         On Tue, Sep 26, 2017 at 10:57 AM, Stefan McKinnon Høj-Edwards
>         <sme at iysik.com <mailto:sme at iysik.com>> wrote:
> 
>             Hi Göran,
> 
>             Could you please elaborate on which kind of subsetting that
>             Hadley dislikes?
>             I am yet to encounter operations on data frames that are not
>             possible on
>             tribbles.
> 
>             Kindly,
>             Stefan McKinnon Hoj-Edwards
> 
>             Stefan McKinnon Høj-Edwards
>             ph.d. Genetics
>             +44 (0)776 231 2464 <tel:%2B44%20%280%29776%20231%202464>
>             +45 2888 6598 <tel:%2B45%202888%206598>
>             Skype: stefan_edwards
> 
>             2017-09-26 8:30 GMT+01:00 Göran Broström
>             <goran.brostrom at umu.se <mailto:goran.brostrom at umu.se>>:
> 
>              > I am beginning to get complaints from users of my CRAN
>             packages
>              > (especially 'eha') to the effect that they get error
>             messages like "Error:
>              > Unsupported use of matrix or array for column indexing".
>              >
>              > It turns out that they are sticking in tibbles into
>             functions that expect
>              > data frames as input. And I am using the kind of
>             subsetting that Hadley
>              > dislikes (eha is an old package, much older than
>             tibbles). It is of course
>              > a simple matter to change the code so it handles both
>             data frames and
>              > tibbles correctly, but this affects many functions, and
>             it will take some
>              > time. And when the next guy introduces 'troubles' as an
>             improvement of
>              > 'tibbles', I will have to rewrite the code again.
>              >
>              > While I like Hadley's way of doing it, I think it is a
>             mistake to let a
>              > tibble also be of class data frame. To me it is a matter
>             of inheritance and
>              > backwards compability: A tibble should add nice things to
>             a data frame, not
>              > change basic behaviour, in order to call itself a data frame.
>              >
>              > Is it correct to let a tibble be of class "data.frame"?
>              >
>              > Göran Broström
>              >
>              > ______________________________________________
>              > R-package-devel at r-project.org
>             <mailto:R-package-devel at r-project.org> mailing list
>              > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>             <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
> 
>                      [[alternative HTML version deleted]]
> 
>             ______________________________________________
>             R-package-devel at r-project.org
>             <mailto:R-package-devel at r-project.org> mailing list
>             https://stat.ethz.ch/mailman/listinfo/r-package-devel
>             <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
> 
> 
> 
> 
>         -- 
>         Joris Meys
>         Statistical consultant
> 
>         Ghent University
>         Faculty of Bioscience Engineering
>         Department of Mathematical Modelling, Statistics and Bio-Informatics
> 
>         tel : +32 9 264 59 87 <tel:+32%209%20264%2059%2087>
>         Joris.Meys at Ugent.be
>         -------------------------------
>         Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>         <http://helpdesk.ugent.be/e-maildisclaimer.php>
> 
> 
> 
> 
> 
> -- 
> Joris Meys
> Statistical consultant
> 
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
> 
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



More information about the R-package-devel mailing list