[Bioc-devel] merging DFrames

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Wed Oct 21 18:36:41 CEST 2020


Hi Laurent,

I think the current implementation was just an expedient to have 
something that works (in most cases). I don't know if a proper 
implementation that doesn't go thru data.frame is on the TODO list. Michael?

I suggest you open an issue on GitHub under S4Vectors.

Cheers,
H.

PS: Note that you can pass the list elements directly to the List() 
constructor, no need to construct an ordinary list first:

   List(1, 1:2, 1:3)  # same as List(list(1, 1:2, 1:3)))


On 10/21/20 08:35, Laurent Gatto wrote:
> When merging DFrame instances, the *List types are lost:
> 
> The following two instances have NumericList columns (y and z)
> d1 <- DataFrame(x = letters[1:3], y = List(list(1, 1:2, 1:3)))
> d2 <- DataFrame(x = letters[1:3], z = List(list(1:3, 1:2, 1)))
> 
> d1
> ## DataFrame with 3 rows and 2 columns
> ##             x             y
> ##   <character> <NumericList>
> ## 1           a             1
> ## 2           b           1,2
> ## 3           c         1,2,3
> 
> That are however converted to list when merged
> 
> merge(d1, d2, by = "x")
> ## DataFrame with 3 rows and 3 columns
> ##             x      y      z
> ##   <character> <list> <list>
> ## 1           a      1  1,2,3
> ## 2           b    1,2    1,2
> ## 3           c  1,2,3      1
> 
> Looking at merge,DataTable,DataTable (form with merge,DFrame,DFrame inherits), this makes sense given that they are converted to data.frames, merged with merge,data.frame,data.frame and the results is coerced back to DFrame:
> 
>> getMethod("merge", c("DataTable", "DataTable"))
> Method Definition:
> 
> function (x, y, ...)
> {
>      .local <- function (x, y, by, ...)
>      {
>          if (is(by, "Hits")) {
>              return(.mergeByHits(x, y, by, ...))
>          }
>          as(merge(as(x, "data.frame"), as(y, "data.frame"), by,
>              ...), class(x))
>      }
>      .local(x, y, ...)
> }
> <bytecode: 0x556dd0032ca8>
> <environment: namespace:S4Vectors>
> 
> Signatures:
>          x           y
> target  "DataTable" "DataTable"
> defined "DataTable" "DataTable"
> 
> I would like not to loose the *List classes in the individual DFrames.
>   
> Am I missing something? Is this something that is on the todo list, or that I could help with?
> 
> Best wishes,
> 
> Laurent
> 
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=TUxwEgK30pAlKpQ6SAJcnT6kPVktHlJ-9R_Al6ri-Mg&s=uqmel2bDfLejAXpRYsi-PFcGqjn8b6W-JmfpZDhOF7U&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319


More information about the Bioc-devel mailing list