[BioC] subset GRanges object via ElementMetadata

Hervé Pagès hpages at fhcrc.org
Sat Feb 23 02:33:00 CET 2013


Hi Michael,

On 02/22/2013 12:56 PM, Michael Lawrence wrote:
> Btw, I hacked together a subset() method for GenomicRanges yesterday. It
> respects the metadata columns. Someone could probably come up with some
> reason why that violates the conceptual foundations of something, but I
> find it useful.
>
> So you could do:
> subset(gr, over == 2)

Sounds good to me. Hopefully you set the method on Vector objects,
rather than just GenomicRanges objects.

Thanks,
H.

>
> Will commit shortly.
>
> Michael
>
>
>
>
>
> On Fri, Feb 22, 2013 at 10:10 AM, Tim Triche, Jr. <tim.triche at gmail.com>wrote:
>
>> the shorthand method would be
>>
>> GR[ GR$over == 2 ]
>>
>> and in your example,
>>
>> R> test.gr
>> GRanges with 6 ranges and 3 metadata columns:
>>        seqnames           ranges strand |  edensity     epeak      over
>>           <Rle>        <IRanges>  <Rle> | <integer> <integer> <integer>
>>    [1]     chr1 [713844, 714487]      * |      1000       256         1
>>    [2]     chr1 [762136, 763199]      * |      1000       771         2
>>    [3]     chr1 [780124, 780289]      * |       519        74         0
>>    [4]     chr1 [780533, 780677]      * |       516        68         0
>>    [5]     chr1 [781104, 781387]      * |       601       140         0
>>    [6]     chr1 [793830, 794396]      * |       610       290         0
>>    ---
>>    seqlengths:
>>      chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX
>>   chrY
>>        NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA
>>   NA
>> R> test.gr[ test.gr$over == 2 ]
>> GRanges with 1 range and 3 metadata columns:
>>        seqnames           ranges strand |  edensity     epeak      over
>>           <Rle>        <IRanges>  <Rle> | <integer> <integer> <integer>
>>    [1]     chr1 [762136, 763199]      * |      1000       771         2
>>    ---
>>    seqlengths:
>>      chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX
>>   chrY
>>        NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA
>>   NA
>>
>>
>>
>>
>> On Fri, Feb 22, 2013 at 7:33 AM, Hermann Norpois <hnorpois at gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I am looking for a method to subset a GRangesObject by means of values
>> (or
>>> ElementMetadata column), for instance
>>> over==2.
>>>
>>> How does it work?
>>>
>>> Thanks
>>> Hermann
>>>
>>>
>>>> test.gr
>>> GRanges with 6 ranges and 3 metadata columns:
>>>        seqnames           ranges strand |  edensity     epeak      over
>>>           <Rle>        <IRanges>  <Rle> | <integer> <integer> <integer>
>>>    [1]     chr1 [713844, 714487]      * |      1000       256         1
>>>    [2]     chr1 [762136, 763199]      * |      1000       771         2
>>>    [3]     chr1 [780124, 780289]      * |       519        74         0
>>>    [4]     chr1 [780533, 780677]      * |       516        68         0
>>>    [5]     chr1 [781104, 781387]      * |       601       140         0
>>>    [6]     chr1 [793830, 794396]      * |       610       290         0
>>>    ---
>>>    seqlengths:
>>>      chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX
>>> chrY
>>>        NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA
>>> NA
>>>> dput (test.gr)
>>> new("GRanges"
>>>      , seqnames = new("Rle"
>>>      , values = structure(1L, .Label = c("chr1", "chr10", "chr11",
>> "chr12",
>>> "chr13",
>>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2",
>>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7",
>>> "chr8", "chr9", "chrX", "chrY"), class = "factor")
>>>      , lengths = 6L
>>>      , elementMetadata = NULL
>>>      , metadata = list()
>>> )
>>>      , ranges = new("IRanges"
>>>      , start = c(713844L, 762136L, 780124L, 780533L, 781104L, 793830L)
>>>      , width = c(644L, 1064L, 166L, 145L, 284L, 567L)
>>>      , NAMES = NULL
>>>      , elementType = "integer"
>>>      , elementMetadata = NULL
>>>      , metadata = list()
>>> )
>>>      , strand = new("Rle"
>>>      , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>>>      , lengths = 6L
>>>      , elementMetadata = NULL
>>>      , metadata = list()
>>> )
>>>      , elementMetadata = new("DataFrame"
>>>      , rownames = NULL
>>>      , nrows = 6L
>>>      , listData = structure(list(edensity = c(1000L, 1000L, 519L, 516L,
>>> 601L, 610L
>>> ), epeak = c(256L, 771L, 74L, 68L, 140L, 290L), over = c(1L,
>>> 2L, 0L, 0L, 0L, 0L)), .Names = c("edensity", "epeak", "over"))
>>>      , elementType = "ANY"
>>>      , elementMetadata = NULL
>>>      , metadata = list()
>>> )
>>>      , seqinfo = new("Seqinfo"
>>>      , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14",
>>> "chr15",
>>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21",
>>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
>>> "chrX", "chrY")
>>>      , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>> NA_integer_,
>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>>> NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>>>      , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>> NA, NA,
>>> NA, NA, NA, NA, NA, NA, NA, NA, NA)
>>>      , genome = c(NA_character_, NA_character_, NA_character_,
>>> NA_character_,
>>> NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_,
>>> NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_,
>>> NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_,
>>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
>>> )
>>> )
>>>      , metadata = list()
>>> )
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>> --
>> *A model is a lie that helps you see the truth.*
>> *
>> *
>> Howard Skipper<
>> http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list