[R-pkg-devel] tibbles are not data frames
Göran Broström
goran.brostrom at umu.se
Tue Sep 26 12:02:37 CEST 2017
On 2017-09-26 11:35, Joris Meys wrote:
> I don't like the dropping of dimensions either. That doesn't change the
> fact that a tibble reacts different from a data.frame. So tibbles do not
> inherit correctly from the class data.frame, and it can thus be argued
> that it's against OOP paradigms to pretend tibbles inherit from the
> class data.frame. Defensive coding techniques would check if it's a
> tibble and return an error saying a data.frame is expected. Unless
> tibbles inherit correctly from data.frame.
The correct and logical way (which I use in 'eha') is to check if input
is a data frame, and if not, throw an error. Checking for other things
would soon be too overwhelming.
>
> I have nothing against tibbles. But calling them "data.frame" raises
> expectations that can't be fulfilled.
Exactly what I think. I wouldn't object to changing base data frames to
behave like tibbles (with a few exceptions).
Göran
>
> On Tue, Sep 26, 2017 at 11:23 AM, Stefan McKinnon Høj-Edwards
> <sme at iysik.com <mailto:sme at iysik.com>> wrote:
>
> Thanks for the examples. Personally, I have been struck out multiple
> times by data frames dropping dimensions, so I have a distaste for
> this dropping behaviour.
>
> Personally, I prefer data frame *not* to drop dimensions. They are
> not arrays, where slicing drops a dimension makes sense because all
> entries are same data type.
> You can pull out a column in vector form from both tribbles and data
> frame with the $ index; subsetting a row from a data frame and
> forcing it into an atomic vector will require cast all columns to
> lowest common denominator, often character.
>
> So I would argue that yes, tribbles are data.frame with extra bells
> and whistles, even if I do not understand the use of list columns.
>
> I suggest a defensive coding technique; if you need a data frame
> subset to really be a vector, cast it as a vector. Users *will*
> attempt to throw unexpected structures at your methods. When your
> methods fails in mysterious ways because it didn't extract a vector,
> users will be stupefied. Fail at `as.vector` will indicate why.
>
> Kindly,
> Stefan
>
> Stefan McKinnon Høj-Edwards
> ph.d. Genetics
> +44 (0)776 231 2464 <tel:+44%207762%20312464>
> +45 2888 6598 <tel:+45%2028%2088%2065%2098>
> Skype: stefan_edwards
>
> 2017-09-26 10:05 GMT+01:00 Joris Meys <Joris.Meys at ugent.be
> <mailto:Joris.Meys at ugent.be>>:
>
> Here's one difference:
>
> atib <- tibble(a = 1:5, b = letters[5:1])
> atib[3,"a"]
> as.data.frame(atib)[3,"a"]
>
> The second line returns a tibble (no dropping dimensions), the
> third line does (dropping dimensions). Huge difference if you
> use [ , aColumn] to select a vector from a data frame.
>
> Cheers
> Joris
>
> On Tue, Sep 26, 2017 at 10:57 AM, Stefan McKinnon Høj-Edwards
> <sme at iysik.com <mailto:sme at iysik.com>> wrote:
>
> Hi Göran,
>
> Could you please elaborate on which kind of subsetting that
> Hadley dislikes?
> I am yet to encounter operations on data frames that are not
> possible on
> tribbles.
>
> Kindly,
> Stefan McKinnon Hoj-Edwards
>
> Stefan McKinnon Høj-Edwards
> ph.d. Genetics
> +44 (0)776 231 2464 <tel:%2B44%20%280%29776%20231%202464>
> +45 2888 6598 <tel:%2B45%202888%206598>
> Skype: stefan_edwards
>
> 2017-09-26 8:30 GMT+01:00 Göran Broström
> <goran.brostrom at umu.se <mailto:goran.brostrom at umu.se>>:
>
> > I am beginning to get complaints from users of my CRAN
> packages
> > (especially 'eha') to the effect that they get error
> messages like "Error:
> > Unsupported use of matrix or array for column indexing".
> >
> > It turns out that they are sticking in tibbles into
> functions that expect
> > data frames as input. And I am using the kind of
> subsetting that Hadley
> > dislikes (eha is an old package, much older than
> tibbles). It is of course
> > a simple matter to change the code so it handles both
> data frames and
> > tibbles correctly, but this affects many functions, and
> it will take some
> > time. And when the next guy introduces 'troubles' as an
> improvement of
> > 'tibbles', I will have to rewrite the code again.
> >
> > While I like Hadley's way of doing it, I think it is a
> mistake to let a
> > tibble also be of class data frame. To me it is a matter
> of inheritance and
> > backwards compability: A tibble should add nice things to
> a data frame, not
> > change basic behaviour, in order to call itself a data frame.
> >
> > Is it correct to let a tibble be of class "data.frame"?
> >
> > Göran Broström
> >
> > ______________________________________________
> > R-package-devel at r-project.org
> <mailto:R-package-devel at r-project.org> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
> <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel at r-project.org
> <mailto:R-package-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87 <tel:+32%209%20264%2059%2087>
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> <http://helpdesk.ugent.be/e-maildisclaimer.php>
>
>
>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
More information about the R-package-devel
mailing list