[Bioc-devel] Interoperability between DataFrame and dplyr?

Jim Hester james.f.hester at gmail.com
Fri Apr 24 16:42:54 CEST 2015


dplyr internally converts all `data.frame` objects to its `tbl_df` class
and most dplyr methods operate on the `tbl` superclass,  see (
https://github.com/hadley/dplyr/blob/master/R/tbl-df.r,
https://github.com/hadley/dplyr/blob/master/R/tbl.r).

The most direct route would to getting DataFrame objects working be just to
just provide a method that converts the `DataFrame` objects to
`data.frame`, then call `tbl_df()` on that.

However this would copy the data multiple times, so probably the best
option would be to create a new `tbl_DF` class to handle `DataFrame`
objects directly.  You can look in the various tbl-*.r files at (
https://github.com/hadley/dplyr/blob/master/R/) to see what methods should
be implemented.

On Fri, Apr 24, 2015 at 10:16 AM, Michael Lawrence <
lawrence.michael at gene.com> wrote:

> Sure, but the way DataFrame is flexible is by relying on two abstractions
> in base R. Just length() and '['. If dplyr does the same thing, which seems
> totally reasonable, everything should work the same.
>
> On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey <stvjc at channing.harvard.edu
> >
> wrote:
>
> > Seems to me that DataFrame is too flexible -- you can have very complex
> > objects in the columns (anything that inherits from Vector) with which,
> in
> > its current state, dplyr would not work too naturally.  You would wind up
> > doing a fair amount of coercion of such entities, so it seems to me that
> > arranging a coercion of DataFrames satisfying specific conditions to
> > data.frame would be a path of low resistance.
> >
> > Ready to be corrected of course.
> >
> >
> > On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson <rct at thompsonclan.org>
> > wrote:
> >
> > > Hi all,
> > >
> > > So, dplyr is a pretty cool thing, but it currently works with
> data.frame
> > > and data.table, but not S4Vectors::DataFrame. I'd like to change that
> if
> > > possible, and I assume that this would "simply" involve writing some
> glue
> > > code. However, I'm not really sure where to start, and I expect things
> > > might be complicated because dplyr uses S3 and S4Vectors uses S4. Can
> > > anyone offer any pointers?
> > >
> > > -Ryan
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list