[Bioc-devel] Interoperability between DataFrame and dplyr?

Michael Lawrence lawrence.michael at gene.com
Fri Apr 24 17:07:14 CEST 2015


On Fri, Apr 24, 2015 at 7:42 AM, Jim Hester <james.f.hester at gmail.com>
wrote:

> dplyr internally converts all `data.frame` objects to its `tbl_df` class
> and most dplyr methods operate on the `tbl` superclass,  see (
> https://github.com/hadley/dplyr/blob/master/R/tbl-df.r,
> https://github.com/hadley/dplyr/blob/master/R/tbl.r).
>
>
I hope you're speaking only of the data frame implementation here.


> The most direct route would to getting DataFrame objects working be just
> to just provide a method that converts the `DataFrame` objects to
> `data.frame`, then call `tbl_df()` on that.
>
>
That coercion already exists, of course, and it's via the S3 as.data.frame,
so it should work already.


> However this would copy the data multiple times, so probably the best
> option would be to create a new `tbl_DF` class to handle `DataFrame`
> objects directly.
>

It doesn't copy the data, outside of the list of pointers (so it's pretty
much instantaneous), but yea, I agree a new implementation is the way to go.


> You can look in the various tbl-*.r files at (
> https://github.com/hadley/dplyr/blob/master/R/) to see what methods
> should be implemented.
>
> On Fri, Apr 24, 2015 at 10:16 AM, Michael Lawrence <
> lawrence.michael at gene.com> wrote:
>
>> Sure, but the way DataFrame is flexible is by relying on two abstractions
>> in base R. Just length() and '['. If dplyr does the same thing, which
>> seems
>> totally reasonable, everything should work the same.
>>
>> On Thu, Apr 23, 2015 at 4:32 PM, Vincent Carey <
>> stvjc at channing.harvard.edu>
>> wrote:
>>
>> > Seems to me that DataFrame is too flexible -- you can have very complex
>> > objects in the columns (anything that inherits from Vector) with which,
>> in
>> > its current state, dplyr would not work too naturally.  You would wind
>> up
>> > doing a fair amount of coercion of such entities, so it seems to me that
>> > arranging a coercion of DataFrames satisfying specific conditions to
>> > data.frame would be a path of low resistance.
>> >
>> > Ready to be corrected of course.
>> >
>> >
>> > On Thu, Apr 23, 2015 at 7:06 PM, Ryan C. Thompson <rct at thompsonclan.org
>> >
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > So, dplyr is a pretty cool thing, but it currently works with
>> data.frame
>> > > and data.table, but not S4Vectors::DataFrame. I'd like to change that
>> if
>> > > possible, and I assume that this would "simply" involve writing some
>> glue
>> > > code. However, I'm not really sure where to start, and I expect things
>> > > might be complicated because dplyr uses S3 and S4Vectors uses S4. Can
>> > > anyone offer any pointers?
>> > >
>> > > -Ryan
>> > >
>> > > _______________________________________________
>> > > Bioc-devel at r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> > >
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioc-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list