[Bioc-devel] 'GRangesList' does not keep metadata of items

Hervé Pagès hpages at fhcrc.org
Wed Sep 4 00:03:09 CEST 2013


Related to the storage of a list inside a DataFrame (as a column),
I found 2 issues:

   df <- DataFrame(A=I(list(a=1:3, b="BB")))

1. The name of the col is not as specified:

     > df
     DataFrame with 2 rows and 1 column
              X
         <list>
     1 ########
     2 ########

2. rbind() doesn't work as expected:

     > rbind(df, df)
     DataFrame with 3 rows and 4 columns
             X.a         X.b     X.a.1       X.b.1
       <integer> <character> <integer> <character>
     1         1          BB         1          BB
     2         2          BB         2          BB
     3         3          BB         3          BB

   or it can break:

     > df <- DataFrame(A=I(list(a=1:3, b=character(0))))
     > rbind(df, df)
     Error in DataFrame(cols) : cannot coerce class "list" to a DataFrame

This last issue will break c() on GRangesList objects that have mcols
of the kind I showed previously.

Cheers,
H.


 > sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicRanges_1.13.39 XVector_0.1.0         IRanges_1.19.28
[4] BiocGenerics_0.7.4

loaded via a namespace (and not attached):
[1] stats4_3.0.1 tools_3.0.1

On 09/03/2013 02:40 PM, Hervé Pagès wrote:
> Hi Julian, Michael,
>
> Alternatively a trick is to use the outer mcols of the GRangesList
> object. If the experimental metadata of each GRanges has the same
> structure/fields, and those fields contain single values:
>
>    library(GenomicRanges)
>    gr1 <- GRanges()
>    metadata(gr1) = list(a="1", b="hello")
>    gr2 <- GRanges()
>    metadata(gr2) = list(a="2", b="world")
>
>    grl <- GRangesList(gr1, gr2)
>    mcols(grl) <- DataFrame(a=c(metadata(gr1)$a, metadata(gr2)$a),
>                            b=c(metadata(gr1)$b, metadata(gr2)$b))
>
> Then:
>
>    > mcols(grl)
>    DataFrame with 2 rows and 2 columns
>                a           b
>      <character> <character>
>    1           1       hello
>    2           2       world
>
> If the experimental metadata fields are going to be completely
> arbitrary:
>
>    metadata(gr1) = list(a="1", b="hello")
>    metadata(gr2) = list(a=c("2", "3"), z="foo", y=letters[1:3])
>
>    grl <- GRangesList(gr1, gr2)
>    mcols(grl) <- DataFrame(metadata=I(list(metadata(gr1), metadata(gr2))))
>
> Then:
>
>    > mcols(grl)
>    DataFrame with 2 rows and 1 column
>      metadata
>        <list>
>    1 ########
>    2 ########
>
> 'mcols(grl)$metadata' is a list of lists:
>
>    > mcols(grl)$metadata
>    [[1]]
>    [[1]]$a
>    [1] "1"
>
>    [[1]]$b
>    [1] "hello"
>
>
>    [[2]]
>    [[2]]$a
>    [1] "2" "3"
>
>    [[2]]$z
>    [1] "foo"
>
>    [[2]]$y
>    [1] "a" "b" "c"
>
> Cheers,
> H.
>
>
> On 09/03/2013 06:47 AM, Julian Gehring wrote:
>> Hi Michael,
>>
>> Thanks, using 'GenomicRangesList' instead of 'GRangesList' essentially
>> solves my issues.  Could you please add a small note to the
>> documentation that mentions the different behaviors for the two classes?
>>
>> Best wishes
>> Julian
>>
>>
>> On 09/03/2013 03:34 PM, Michael Lawrence wrote:
>>> If the number of GRanges is small (not thousands), and you don't need
>>> the
>>> semantic of treating each GRanges as a "compound range", then use
>>> GenomicRangesList(). It's a SimpleList, so metadata should be preserved.
>>> It's the data structure for storing per-sample GRanges.
>>>
>>> Michael
>>>
>>>
>>> On Tue, Sep 3, 2013 at 2:39 AM, Julian Gehring
>>> <julian.gehring at embl.de>wrote:
>>>
>>>> Hi Michael,
>>>>
>>>> The use case is storing experimental metadata togther with a GRanges
>>>> object that does not fit the tabular structure of a GRange.  And at a
>>>> later
>>>> stage, storing multiple of these annotated GRanges objects together
>>>> as a
>>>> list/GRangesList.
>>>>
>>>> Best wishes
>>>> Julian
>>>>
>>>>
>>>>
>>>>   This second case is exactly what happens to the individual GRanges
>>>> that
>>>>> constitute the list. They are concatenated to form a single GRanges,
>>>>> which
>>>>> is stored along side a partitioning that defines the individual
>>>>> elements.
>>>>> There is no longer two separate GRanges objects, so there is no easy
>>>>> way
>>>>> to
>>>>> keep the metadata around. It's unfortunate that an implementation
>>>>> detail
>>>>> is
>>>>> exposed in this way, but it would take some effort to support this
>>>>> feature.
>>>>> This is a property of all CompressedList derivatives. What's the use
>>>>> case?
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list