[Bioc-devel] 'GRangesList' does not keep metadata of items

Hervé Pagès hpages at fhcrc.org
Tue Sep 3 23:40:14 CEST 2013


Hi Julian, Michael,

Alternatively a trick is to use the outer mcols of the GRangesList
object. If the experimental metadata of each GRanges has the same
structure/fields, and those fields contain single values:

   library(GenomicRanges)
   gr1 <- GRanges()
   metadata(gr1) = list(a="1", b="hello")
   gr2 <- GRanges()
   metadata(gr2) = list(a="2", b="world")

   grl <- GRangesList(gr1, gr2)
   mcols(grl) <- DataFrame(a=c(metadata(gr1)$a, metadata(gr2)$a),
                           b=c(metadata(gr1)$b, metadata(gr2)$b))

Then:

   > mcols(grl)
   DataFrame with 2 rows and 2 columns
               a           b
     <character> <character>
   1           1       hello
   2           2       world

If the experimental metadata fields are going to be completely
arbitrary:

   metadata(gr1) = list(a="1", b="hello")
   metadata(gr2) = list(a=c("2", "3"), z="foo", y=letters[1:3])

   grl <- GRangesList(gr1, gr2)
   mcols(grl) <- DataFrame(metadata=I(list(metadata(gr1), metadata(gr2))))

Then:

   > mcols(grl)
   DataFrame with 2 rows and 1 column
     metadata
       <list>
   1 ########
   2 ########

'mcols(grl)$metadata' is a list of lists:

   > mcols(grl)$metadata
   [[1]]
   [[1]]$a
   [1] "1"

   [[1]]$b
   [1] "hello"


   [[2]]
   [[2]]$a
   [1] "2" "3"

   [[2]]$z
   [1] "foo"

   [[2]]$y
   [1] "a" "b" "c"

Cheers,
H.


On 09/03/2013 06:47 AM, Julian Gehring wrote:
> Hi Michael,
>
> Thanks, using 'GenomicRangesList' instead of 'GRangesList' essentially
> solves my issues.  Could you please add a small note to the
> documentation that mentions the different behaviors for the two classes?
>
> Best wishes
> Julian
>
>
> On 09/03/2013 03:34 PM, Michael Lawrence wrote:
>> If the number of GRanges is small (not thousands), and you don't need the
>> semantic of treating each GRanges as a "compound range", then use
>> GenomicRangesList(). It's a SimpleList, so metadata should be preserved.
>> It's the data structure for storing per-sample GRanges.
>>
>> Michael
>>
>>
>> On Tue, Sep 3, 2013 at 2:39 AM, Julian Gehring
>> <julian.gehring at embl.de>wrote:
>>
>>> Hi Michael,
>>>
>>> The use case is storing experimental metadata togther with a GRanges
>>> object that does not fit the tabular structure of a GRange.  And at a
>>> later
>>> stage, storing multiple of these annotated GRanges objects together as a
>>> list/GRangesList.
>>>
>>> Best wishes
>>> Julian
>>>
>>>
>>>
>>>   This second case is exactly what happens to the individual GRanges
>>> that
>>>> constitute the list. They are concatenated to form a single GRanges,
>>>> which
>>>> is stored along side a partitioning that defines the individual
>>>> elements.
>>>> There is no longer two separate GRanges objects, so there is no easy
>>>> way
>>>> to
>>>> keep the metadata around. It's unfortunate that an implementation
>>>> detail
>>>> is
>>>> exposed in this way, but it would take some effort to support this
>>>> feature.
>>>> This is a property of all CompressedList derivatives. What's the use
>>>> case?
>>>>
>>>
>>>
>>>
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list