[Bioc-devel] merging DFrames

Laurent Gatto |@urent@g@tto @end|ng |rom uc|ouv@|n@be
Wed Oct 21 17:35:09 CEST 2020


When merging DFrame instances, the *List types are lost:

The following two instances have NumericList columns (y and z)
d1 <- DataFrame(x = letters[1:3], y = List(list(1, 1:2, 1:3)))
d2 <- DataFrame(x = letters[1:3], z = List(list(1:3, 1:2, 1)))

d1
## DataFrame with 3 rows and 2 columns
##             x             y
##   <character> <NumericList>
## 1           a             1
## 2           b           1,2
## 3           c         1,2,3

That are however converted to list when merged

merge(d1, d2, by = "x")
## DataFrame with 3 rows and 3 columns
##             x      y      z
##   <character> <list> <list>
## 1           a      1  1,2,3
## 2           b    1,2    1,2
## 3           c  1,2,3      1

Looking at merge,DataTable,DataTable (form with merge,DFrame,DFrame inherits), this makes sense given that they are converted to data.frames, merged with merge,data.frame,data.frame and the results is coerced back to DFrame:

> getMethod("merge", c("DataTable", "DataTable"))
Method Definition:

function (x, y, ...) 
{
    .local <- function (x, y, by, ...) 
    {
        if (is(by, "Hits")) {
            return(.mergeByHits(x, y, by, ...))
        }
        as(merge(as(x, "data.frame"), as(y, "data.frame"), by, 
            ...), class(x))
    }
    .local(x, y, ...)
}
<bytecode: 0x556dd0032ca8>
<environment: namespace:S4Vectors>

Signatures:
        x           y          
target  "DataTable" "DataTable"
defined "DataTable" "DataTable"

I would like not to loose the *List classes in the individual DFrames.
 
Am I missing something? Is this something that is on the todo list, or that I could help with?

Best wishes,

Laurent




More information about the Bioc-devel mailing list