[BioC] Add extra columns to GRanges Metadata

Paul Leo p.leo at uq.edu.au
Tue Feb 21 07:35:32 CET 2012


Well in the end I did this the way I always do it: extracted the
elementMetaData modified it and reassigned it in it's entirety.

It works fine but it seems a little ham fisted hence my original post. 

I had to do this in the end ANYWAY because  c(,) , it appears,  requires
the elementMetadata of the GRanges in the same order. see example below.
Something I seemed to have avoided until now.

Given this I don't *think*  that the other solns will work anyway. But
there are indeed other issues that I did not appreciate that you point
out. Many thanks .

I guess I'm after this  functionality since for combining  mutation data
from various sources/programs , SNP , indels, large indels, CNV etc, the
quality,genotype, annotation data I put in the elementMetaData. To take
along for the analysis and filtering.

I could have used GrangeList... but  in the final disease modelling  is
not convenient to use a Granges object


So the ordering thing is a little bit of a pain. But the problem is
solvable with the current data structures (which are just fab- thanks).

Though it would be handy to add extra metadata in a computationally
rapid way and not to worry about their order.

Cheers
Paul






PS: Grange combning when metaData in different orders example:

gr1 <-GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges =
          IRanges(1:10, end = 7:16, names = head(letters, 10)),
          strand =
          Rle(strand(c("-", "+", "*", "+", "-")),
              c(1, 2, 2, 3, 2)),
          score = 1:10,
          GC = seq(1, 0, length=10))
gr2<-GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges =
          IRanges(1:10, end = 7:16, names = head(letters, 10)),
          strand =
          Rle(strand(c("-", "+", "*", "+", "-")),
              c(1, 2, 2, 3, 2)),
          GC = seq(1, 0, length=10),
          score = 1:10
          )

gr1
gr2

gr1 <-GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges =
          IRanges(1:10, end = 7:16, names = head(letters, 10)),
          strand =
          Rle(strand(c("-", "+", "*", "+", "-")),
              c(1, 2, 2, 3, 2)),
          score = 1:10,
          GC = seq(1, 0, length=10))
gr2<-GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges =
          IRanges(1:10, end = 7:16, names = head(letters, 10)),
          strand =
          Rle(strand(c("-", "+", "*", "+", "-")),
              c(1, 2, 2, 3, 2)),
          GC = seq(1, 0, length=10),
          score = 1:10
          )
gr1
gr2

gr.all<-c(gr1,gr2)
Error in .Method(..., deparse.level = deparse.level) : 
  column names for arg 2 do not match those of first arg

gr2<-GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges =
          IRanges(1:10, end = 7:16, names = head(letters, 10)),
          strand =
          Rle(strand(c("-", "+", "*", "+", "-")),
              c(1, 2, 2, 3, 2)),
          score = 1:10,GC = seq(1, 0, length=10)
          )
gr.all<-c(gr1,gr2)


> sessionInfo()
R Under development (unstable) (2011-11-23 r57740)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base     

other attached packages:
 [1] multicore_0.1-7        biomaRt_2.11.1
Rsamtools_1.7.23      
 [4] Biostrings_2.23.6      GenomicFeatures_1.7.10
AnnotationDbi_1.17.14 
 [7] Biobase_2.15.3         GenomicRanges_1.7.15
IRanges_1.13.20       
[10] BiocGenerics_0.1.4    

loaded via a namespace (and not attached):
[1] bitops_1.0-4.1     BSgenome_1.23.2    DBI_0.2-5
RCurl_1.9-5       
[5] RSQLite_0.11.1     rtracklayer_1.15.6 tools_2.15.0
XML_3.9-2         
[9] zlibbioc_1.1.0    
> 

-----Original Message-----
From: Michael Lawrence <lawrence.michael at gene.com>
To: Martin Morgan <mtmorgan at fhcrc.org>
Cc: Michael Lawrence <lawrence.michael at gene.com>, bioconductor
<bioconductor at r-project.org>
Subject: Re: [BioC] Add extra columns to GRanges Metadata
Date: Mon, 20 Feb 2012 16:38:13 -0800

On Mon, Feb 20, 2012 at 4:33 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:

> On 02/20/2012 11:50 AM, Tengfei Yin wrote:
>
>> +1 for Tim's methods.....especially the $ and $<- method, it's always in a
>> wishlist for me :)
>>
>> well, maybe "$ " could be sometimes confusing when people trying to treat
>> GRanges as a general format similar to data.frame, and do something like
>> gr$seqnames<- blabla, the good part about current API is that it remind me
>> the difference bettween  elementMetadata and three required fields... but
>>
>
> GRanges extends Vector and
>
> > gr = GRanges("A", IRanges(1:10, 20), X=1:10, Y=1:10)
> > length(gr)
> [1] 10
> > length(values(gr))
> [1] 2
>
> shows the geometry -- one would expect gr$foo to access the range named
> 'foo' rather than the column of elementMetadata named 'foo'. I think this
> is the crux of the resistance.
>
>
Right, thanks for reiterating this. The question is whether the convenience
of the shortcut outweighs the conceptual inconsistency. Probably depends
heavily on one's perspective. I would favor the perspective of the frequent
user.


> When Tim says
>
>
> On 02/20/2012 11:45 AM, Tim Triche, Jr. wrote:
> > if these don't belong in BiocGenerics, I don't know what does :-D
>
> I'm not sure if the reference is to "intersect" (which is already in
> BiocGenerics), "%i%" etc., or to $ and $<-. For $ and $<-, they don't need
> to be in BiocGenerics because they are already made generics by the methods
> package -- BiocGenerics is meant to provide a place to define generics that
> would otherwise be defined independently in multiple packages.
>
> Martin
>
>
>  it will be convenient if general $ method support replacement? even for
>> seqnames, strand and ranges part? with default checking?
>>
>> other methods mentioned by Tim are also handy if default of those function
>> is under assumption.
>>
>> On Mon, Feb 20, 2012 at 1:04 PM, Michael Lawrence<lawrence.michael@**
>> gene.com <lawrence.michael at gene.com>
>>
>>> wrote:
>>>
>>
>>  On Mon, Feb 20, 2012 at 10:36 AM, Steve Lianoglou<
>>> mailinglist.honeypot at gmail.com**>  wrote:
>>>
>>>  Hi Tim,
>>>>
>>>> These functions are quite handy ... definitely going to poach them.
>>>>
>>>>
>>>
>>> I wonder if we should keep poaching these in the wild or if Bioc Core
>>> will
>>> ever bow to popular pressure and allow their inclusion in the base
>>> packages? ;-)
>>>
>>>
>>>  Also ... you need an editor with better code folding support ;-)
>>>>
>>>> Pual,
>>>>
>>>> Aslo, to do what you want to do, namely:
>>>>
>>>>
>>>>>
>>>>  extra.blank<-matrix(data=NA,**nrow=length(a.grs),ncol=**
>>> length(missing.meta.cols))
>>>
>>>> old.meta<-as.data.frame(**values(a.grs))
>>>>> new.meta<-cbind(old.meta,**extra.blank)
>>>>> values(a.grs)<-new.meta
>>>>>
>>>>
>>>> Using vanilla GenomicRanges functionality, you just do this:
>>>>
>>>> R>  values(a.grs)<- cbind(values(a.grs), DataFrame(extra.blank))
>>>>
>>>> HTH,
>>>>
>>>> -steve
>>>>
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>>  | Memorial Sloan-Kettering Cancer Center
>>>>  | Weill Medical College of Cornell University
>>>> Contact Info: http://cbio.mskcc.org/~lianos/**contact<http://cbio.mskcc.org/%7Elianos/contact>
>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives:
>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>>
>>
>>
>>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list