[BioC] overlap regions between two GRanges (or GRangesList)

Hervé Pagès hpages at fhcrc.org
Fri Mar 14 23:19:21 CET 2014


Hi Zhao,

I'm going to try to rephrase what you want to do: seems like you want
to keep the ranges in 'g' that have an overlap with any of the ranges
in 'g2':

   > g
   GRanges with 4 ranges and 0 metadata columns:
         seqnames    ranges strand
            <Rle> <IRanges>  <Rle>
     [1]     chr1  [ 1, 12]      -
     [2]     chr2  [ 2, 15]      +
     [3]     chr2  [ 3, 20]      +
     [4]     chr3  [10, 14]      -
     ---
     seqlengths:
      chr1 chr2 chr3
        NA   NA   NA

   > g2
   GRanges with 2 ranges and 0 metadata columns:
         seqnames    ranges strand
            <Rle> <IRanges>  <Rle>
     [1]     chr1   [1,  8]      -
     [2]     chr2   [2, 30]      +
     ---
     seqlengths:
      chr1 chr2
        NA   NA

   > subsetByOverlaps(g, g2)
   GRanges with 3 ranges and 0 metadata columns:
         seqnames    ranges strand
            <Rle> <IRanges>  <Rle>
     [1]     chr1   [1, 12]      -
     [2]     chr2   [2, 15]      +
     [3]     chr2   [3, 20]      +
     ---
     seqlengths:
      chr1 chr2 chr3
        NA   NA   NA

However, in the desired output you show below, you omitted the
1st range (the range on chr1). So it's not clear to me what you
are really after. Did you omit it because it's not *within* the
overlapping range in 'g2'? If that's the case, then:

   > subsetByOverlaps(g, g2, type="within")
   GRanges with 2 ranges and 0 metadata columns:
         seqnames    ranges strand
            <Rle> <IRanges>  <Rle>
     [1]     chr2   [2, 15]      +
     [2]     chr2   [3, 20]      +
     ---
     seqlengths:
      chr1 chr2 chr3
        NA   NA   NA

HTH,
H.


On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote:
> Seems intersect can do what I want, but not quietly exact.
>
> I would like to output two records (for b,c) below instead of unite them
> into  a single record [2, 20] when call *intersect(g,g2)**.*
>
>                  [2, 15]
>
>                  [3, 20]
>
> Thanks,
>
> Shanrong
>
> P.S
>
>> g
>
> GRanges with 4 ranges and 2 metadata columns:
>
>      seqnames    ranges strand |     score                GC
>
>         <Rle> <IRanges>  <Rle> | <integer>         <numeric>
>
>    a     chr1  [ 1, 12]      - |         1                 1
>
> b     chr2  [ 2, 15]      + |         2 0.888888888888889
>
>    c     chr2  [ 3, 20]      + |         3 0.777777777777778
>
>    j     chr3  [10, 14]      - |        10                 0
>
>    ---
>
>    seqlengths:
>
>     chr1 chr2 chr3
>
>       NA   NA   NA
>
>> g2
>
> GRanges with 2 ranges and 2 metadata columns:
>
>      seqnames    ranges strand |     score                GC
>
>         <Rle> <IRanges>  <Rle> | <integer>         <numeric>
>
>    a     chr1   [1,  8]      - |         1                 1
>
> *  b     chr2   [2, 30]      + |         2 0.888888888888889*
>
>    ---
>
>    seqlengths:
>
>     chr1 chr2 chr3
>
>       NA   NA   NA
>
>> intersect(g,g2)
>
> GRanges with 2 ranges and 0 metadata columns:
>
>        seqnames    ranges strand
>
>           <Rle> <IRanges>  <Rle>
>
>    [1]     chr1   [1,  8]      -
>
> [2]     chr2   [*2, 20*]      +
>
>    ---
>
>    seqlengths:
>
>     chr1 chr2 chr3
>
>       NA   NA   NA
>
> *From:*Zhao, Shanrong [JRDUS]
> *Sent:* Thursday, March 13, 2014 9:40 PM
> *To:* 'hpages at fhcrc.org'
> *Subject:* overlap regions between two GRanges (or GRangesList)
>
> Dear Dr. Pages,
>
> I am exploring Bioconductor packages to analyze DNA methylation data.
> One question I don’t know how to solve it. I have two GRanges (OR
> GRangesList),  Now I want to identify the common (*overlapping)
> regions*, not just overlapping or not--- *gc <- overlapRegions(ga,gb)*
>
> The other question I have: I am interested in all *cytosines* in
> promoters regions, I have already had promoter in GRange object. What is
> the most efficient way to identify the total number of Cs?   I plan to
> extract all DNA sequences corresponding to promoters, and then call
> letterFrequency (by set letters=”C”).
>
> Best regards,
>
> Shanrong
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list