[Bioc-devel] 'GRangesList' does not keep metadata of items
Hervé Pagès
hpages at fhcrc.org
Wed Sep 4 00:03:09 CEST 2013
Related to the storage of a list inside a DataFrame (as a column),
I found 2 issues:
df <- DataFrame(A=I(list(a=1:3, b="BB")))
1. The name of the col is not as specified:
> df
DataFrame with 2 rows and 1 column
X
<list>
1 ########
2 ########
2. rbind() doesn't work as expected:
> rbind(df, df)
DataFrame with 3 rows and 4 columns
X.a X.b X.a.1 X.b.1
<integer> <character> <integer> <character>
1 1 BB 1 BB
2 2 BB 2 BB
3 3 BB 3 BB
or it can break:
> df <- DataFrame(A=I(list(a=1:3, b=character(0))))
> rbind(df, df)
Error in DataFrame(cols) : cannot coerce class "list" to a DataFrame
This last issue will break c() on GRangesList objects that have mcols
of the kind I showed previously.
Cheers,
H.
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicRanges_1.13.39 XVector_0.1.0 IRanges_1.19.28
[4] BiocGenerics_0.7.4
loaded via a namespace (and not attached):
[1] stats4_3.0.1 tools_3.0.1
On 09/03/2013 02:40 PM, Hervé Pagès wrote:
> Hi Julian, Michael,
>
> Alternatively a trick is to use the outer mcols of the GRangesList
> object. If the experimental metadata of each GRanges has the same
> structure/fields, and those fields contain single values:
>
> library(GenomicRanges)
> gr1 <- GRanges()
> metadata(gr1) = list(a="1", b="hello")
> gr2 <- GRanges()
> metadata(gr2) = list(a="2", b="world")
>
> grl <- GRangesList(gr1, gr2)
> mcols(grl) <- DataFrame(a=c(metadata(gr1)$a, metadata(gr2)$a),
> b=c(metadata(gr1)$b, metadata(gr2)$b))
>
> Then:
>
> > mcols(grl)
> DataFrame with 2 rows and 2 columns
> a b
> <character> <character>
> 1 1 hello
> 2 2 world
>
> If the experimental metadata fields are going to be completely
> arbitrary:
>
> metadata(gr1) = list(a="1", b="hello")
> metadata(gr2) = list(a=c("2", "3"), z="foo", y=letters[1:3])
>
> grl <- GRangesList(gr1, gr2)
> mcols(grl) <- DataFrame(metadata=I(list(metadata(gr1), metadata(gr2))))
>
> Then:
>
> > mcols(grl)
> DataFrame with 2 rows and 1 column
> metadata
> <list>
> 1 ########
> 2 ########
>
> 'mcols(grl)$metadata' is a list of lists:
>
> > mcols(grl)$metadata
> [[1]]
> [[1]]$a
> [1] "1"
>
> [[1]]$b
> [1] "hello"
>
>
> [[2]]
> [[2]]$a
> [1] "2" "3"
>
> [[2]]$z
> [1] "foo"
>
> [[2]]$y
> [1] "a" "b" "c"
>
> Cheers,
> H.
>
>
> On 09/03/2013 06:47 AM, Julian Gehring wrote:
>> Hi Michael,
>>
>> Thanks, using 'GenomicRangesList' instead of 'GRangesList' essentially
>> solves my issues. Could you please add a small note to the
>> documentation that mentions the different behaviors for the two classes?
>>
>> Best wishes
>> Julian
>>
>>
>> On 09/03/2013 03:34 PM, Michael Lawrence wrote:
>>> If the number of GRanges is small (not thousands), and you don't need
>>> the
>>> semantic of treating each GRanges as a "compound range", then use
>>> GenomicRangesList(). It's a SimpleList, so metadata should be preserved.
>>> It's the data structure for storing per-sample GRanges.
>>>
>>> Michael
>>>
>>>
>>> On Tue, Sep 3, 2013 at 2:39 AM, Julian Gehring
>>> <julian.gehring at embl.de>wrote:
>>>
>>>> Hi Michael,
>>>>
>>>> The use case is storing experimental metadata togther with a GRanges
>>>> object that does not fit the tabular structure of a GRange. And at a
>>>> later
>>>> stage, storing multiple of these annotated GRanges objects together
>>>> as a
>>>> list/GRangesList.
>>>>
>>>> Best wishes
>>>> Julian
>>>>
>>>>
>>>>
>>>> This second case is exactly what happens to the individual GRanges
>>>> that
>>>>> constitute the list. They are concatenated to form a single GRanges,
>>>>> which
>>>>> is stored along side a partitioning that defines the individual
>>>>> elements.
>>>>> There is no longer two separate GRanges objects, so there is no easy
>>>>> way
>>>>> to
>>>>> keep the metadata around. It's unfortunate that an implementation
>>>>> detail
>>>>> is
>>>>> exposed in this way, but it would take some effort to support this
>>>>> feature.
>>>>> This is a property of all CompressedList derivatives. What's the use
>>>>> case?
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list