[BioC] Add extra columns to GRanges Metadata
Paul Leo
p.leo at uq.edu.au
Tue Feb 21 07:35:32 CET 2012
Well in the end I did this the way I always do it: extracted the
elementMetaData modified it and reassigned it in it's entirety.
It works fine but it seems a little ham fisted hence my original post.
I had to do this in the end ANYWAY because c(,) , it appears, requires
the elementMetadata of the GRanges in the same order. see example below.
Something I seemed to have avoided until now.
Given this I don't *think* that the other solns will work anyway. But
there are indeed other issues that I did not appreciate that you point
out. Many thanks .
I guess I'm after this functionality since for combining mutation data
from various sources/programs , SNP , indels, large indels, CNV etc, the
quality,genotype, annotation data I put in the elementMetaData. To take
along for the analysis and filtering.
I could have used GrangeList... but in the final disease modelling is
not convenient to use a Granges object
So the ordering thing is a little bit of a pain. But the problem is
solvable with the current data structures (which are just fab- thanks).
Though it would be handy to add extra metadata in a computationally
rapid way and not to worry about their order.
Cheers
Paul
PS: Grange combning when metaData in different orders example:
gr1 <-GRanges(seqnames =
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges =
IRanges(1:10, end = 7:16, names = head(letters, 10)),
strand =
Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10))
gr2<-GRanges(seqnames =
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges =
IRanges(1:10, end = 7:16, names = head(letters, 10)),
strand =
Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
GC = seq(1, 0, length=10),
score = 1:10
)
gr1
gr2
gr1 <-GRanges(seqnames =
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges =
IRanges(1:10, end = 7:16, names = head(letters, 10)),
strand =
Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10,
GC = seq(1, 0, length=10))
gr2<-GRanges(seqnames =
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges =
IRanges(1:10, end = 7:16, names = head(letters, 10)),
strand =
Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
GC = seq(1, 0, length=10),
score = 1:10
)
gr1
gr2
gr.all<-c(gr1,gr2)
Error in .Method(..., deparse.level = deparse.level) :
column names for arg 2 do not match those of first arg
gr2<-GRanges(seqnames =
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
ranges =
IRanges(1:10, end = 7:16, names = head(letters, 10)),
strand =
Rle(strand(c("-", "+", "*", "+", "-")),
c(1, 2, 2, 3, 2)),
score = 1:10,GC = seq(1, 0, length=10)
)
gr.all<-c(gr1,gr2)
> sessionInfo()
R Under development (unstable) (2011-11-23 r57740)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
base
other attached packages:
[1] multicore_0.1-7 biomaRt_2.11.1
Rsamtools_1.7.23
[4] Biostrings_2.23.6 GenomicFeatures_1.7.10
AnnotationDbi_1.17.14
[7] Biobase_2.15.3 GenomicRanges_1.7.15
IRanges_1.13.20
[10] BiocGenerics_0.1.4
loaded via a namespace (and not attached):
[1] bitops_1.0-4.1 BSgenome_1.23.2 DBI_0.2-5
RCurl_1.9-5
[5] RSQLite_0.11.1 rtracklayer_1.15.6 tools_2.15.0
XML_3.9-2
[9] zlibbioc_1.1.0
>
-----Original Message-----
From: Michael Lawrence <lawrence.michael at gene.com>
To: Martin Morgan <mtmorgan at fhcrc.org>
Cc: Michael Lawrence <lawrence.michael at gene.com>, bioconductor
<bioconductor at r-project.org>
Subject: Re: [BioC] Add extra columns to GRanges Metadata
Date: Mon, 20 Feb 2012 16:38:13 -0800
On Mon, Feb 20, 2012 at 4:33 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 02/20/2012 11:50 AM, Tengfei Yin wrote:
>
>> +1 for Tim's methods.....especially the $ and $<- method, it's always in a
>> wishlist for me :)
>>
>> well, maybe "$ " could be sometimes confusing when people trying to treat
>> GRanges as a general format similar to data.frame, and do something like
>> gr$seqnames<- blabla, the good part about current API is that it remind me
>> the difference bettween elementMetadata and three required fields... but
>>
>
> GRanges extends Vector and
>
> > gr = GRanges("A", IRanges(1:10, 20), X=1:10, Y=1:10)
> > length(gr)
> [1] 10
> > length(values(gr))
> [1] 2
>
> shows the geometry -- one would expect gr$foo to access the range named
> 'foo' rather than the column of elementMetadata named 'foo'. I think this
> is the crux of the resistance.
>
>
Right, thanks for reiterating this. The question is whether the convenience
of the shortcut outweighs the conceptual inconsistency. Probably depends
heavily on one's perspective. I would favor the perspective of the frequent
user.
> When Tim says
>
>
> On 02/20/2012 11:45 AM, Tim Triche, Jr. wrote:
> > if these don't belong in BiocGenerics, I don't know what does :-D
>
> I'm not sure if the reference is to "intersect" (which is already in
> BiocGenerics), "%i%" etc., or to $ and $<-. For $ and $<-, they don't need
> to be in BiocGenerics because they are already made generics by the methods
> package -- BiocGenerics is meant to provide a place to define generics that
> would otherwise be defined independently in multiple packages.
>
> Martin
>
>
> it will be convenient if general $ method support replacement? even for
>> seqnames, strand and ranges part? with default checking?
>>
>> other methods mentioned by Tim are also handy if default of those function
>> is under assumption.
>>
>> On Mon, Feb 20, 2012 at 1:04 PM, Michael Lawrence<lawrence.michael@**
>> gene.com <lawrence.michael at gene.com>
>>
>>> wrote:
>>>
>>
>> On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou<
>>> mailinglist.honeypot at gmail.com**> wrote:
>>>
>>> Hi Tim,
>>>>
>>>> These functions are quite handy ... definitely going to poach them.
>>>>
>>>>
>>>
>>> I wonder if we should keep poaching these in the wild or if Bioc Core
>>> will
>>> ever bow to popular pressure and allow their inclusion in the base
>>> packages? ;-)
>>>
>>>
>>> Also ... you need an editor with better code folding support ;-)
>>>>
>>>> Pual,
>>>>
>>>> Aslo, to do what you want to do, namely:
>>>>
>>>>
>>>>>
>>>> extra.blank<-matrix(data=NA,**nrow=length(a.grs),ncol=**
>>> length(missing.meta.cols))
>>>
>>>> old.meta<-as.data.frame(**values(a.grs))
>>>>> new.meta<-cbind(old.meta,**extra.blank)
>>>>> values(a.grs)<-new.meta
>>>>>
>>>>
>>>> Using vanilla GenomicRanges functionality, you just do this:
>>>>
>>>> R> values(a.grs)<- cbind(values(a.grs), DataFrame(extra.blank))
>>>>
>>>> HTH,
>>>>
>>>> -steve
>>>>
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>> | Memorial Sloan-Kettering Cancer Center
>>>> | Weill Medical College of Cornell University
>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http://cbio.mskcc.org/%7Elianos/contact>
>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives:
>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>>
>>
>>
>>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>
[[alternative HTML version deleted]]
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list