[BioC] overlap regions between two GRanges (or GRangesList)

Cook, Malcolm MEC at stowers.org
Sun Mar 16 18:39:10 CET 2014


Zhao,

sounds like pintersect is your friend.

two steps:

1) use overlaps to find indices into g and g2

2) use pintersect to find the overlapping segments between g and g2

~ malcolm_cook at stowers.org

________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Zhao, Shanrong [JRDUS] [SZhao7 at its.jnj.com]
Sent: Friday, March 14, 2014 5:31 PM
To: Hervé Pagès
Cc: bioconductor at r-project.org
Subject: Re: [BioC] overlap regions between two GRanges (or GRangesList)

Thank you. subsetByOverlaps is not what I want.  I just want to keep the ranges in 'g' that overlaps with g2, but only keep "overlapping" regions (instead of the original ranges in 'g1').



Thanks again,

Shanrong



-----Original Message-----
From: Hervé Pagès [mailto:hpages at fhcrc.org]
Sent: Friday, March 14, 2014 3:19 PM
To: Zhao, Shanrong [JRDUS]
Cc: bioconductor at r-project.org
Subject: Re: overlap regions between two GRanges (or GRangesList)



Hi Zhao,



I'm going to try to rephrase what you want to do: seems like you want to keep the ranges in 'g' that have an overlap with any of the ranges in 'g2':



   > g

   GRanges with 4 ranges and 0 metadata columns:

         seqnames    ranges strand

            <Rle> <IRanges>  <Rle>

     [1]     chr1  [ 1, 12]      -

     [2]     chr2  [ 2, 15]      +

     [3]     chr2  [ 3, 20]      +

     [4]     chr3  [10, 14]      -

     ---

     seqlengths:

      chr1 chr2 chr3

        NA   NA   NA



   > g2

   GRanges with 2 ranges and 0 metadata columns:

         seqnames    ranges strand

            <Rle> <IRanges>  <Rle>

     [1]     chr1   [1,  8]      -

     [2]     chr2   [2, 30]      +

     ---

     seqlengths:

      chr1 chr2

        NA   NA



   > subsetByOverlaps(g, g2)

   GRanges with 3 ranges and 0 metadata columns:

         seqnames    ranges strand

            <Rle> <IRanges>  <Rle>

     [1]     chr1   [1, 12]      -

     [2]     chr2   [2, 15]      +

     [3]     chr2   [3, 20]      +

     ---

     seqlengths:

      chr1 chr2 chr3

        NA   NA   NA



However, in the desired output you show below, you omitted the 1st range (the range on chr1). So it's not clear to me what you are really after. Did you omit it because it's not *within* the overlapping range in 'g2'? If that's the case, then:



   > subsetByOverlaps(g, g2, type="within")

   GRanges with 2 ranges and 0 metadata columns:

         seqnames    ranges strand

            <Rle> <IRanges>  <Rle>

     [1]     chr2   [2, 15]      +

     [2]     chr2   [3, 20]      +

     ---

     seqlengths:

      chr1 chr2 chr3

        NA   NA   NA



HTH,

H.





On 03/13/2014 10:12 PM, Zhao, Shanrong [JRDUS] wrote:

> Seems intersect can do what I want, but not quietly exact.

>

> I would like to output two records (for b,c) below instead of unite

> them into  a single record [2, 20] when call *intersect(g,g2)**.*

>

>                  [2, 15]

>

>                  [3, 20]

>

> Thanks,

>

> Shanrong

>

> P.S

>

>> g

>

> GRanges with 4 ranges and 2 metadata columns:

>

>      seqnames    ranges strand |     score                GC

>

>         <Rle> <IRanges>  <Rle> | <integer>         <numeric>

>

>    a     chr1  [ 1, 12]      - |         1                 1

>

> b     chr2  [ 2, 15]      + |         2 0.888888888888889

>

>    c     chr2  [ 3, 20]      + |         3 0.777777777777778

>

>    j     chr3  [10, 14]      - |        10                 0

>

>    ---

>

>    seqlengths:

>

>     chr1 chr2 chr3

>

>       NA   NA   NA

>

>> g2

>

> GRanges with 2 ranges and 2 metadata columns:

>

>      seqnames    ranges strand |     score                GC

>

>         <Rle> <IRanges>  <Rle> | <integer>         <numeric>

>

>    a     chr1   [1,  8]      - |         1                 1

>

> *  b     chr2   [2, 30]      + |         2 0.888888888888889*

>

>    ---

>

>    seqlengths:

>

>     chr1 chr2 chr3

>

>       NA   NA   NA

>

>> intersect(g,g2)

>

> GRanges with 2 ranges and 0 metadata columns:

>

>        seqnames    ranges strand

>

>           <Rle> <IRanges>  <Rle>

>

>    [1]     chr1   [1,  8]      -

>

> [2]     chr2   [*2, 20*]      +

>

>    ---

>

>    seqlengths:

>

>     chr1 chr2 chr3

>

>       NA   NA   NA

>

> *From:*Zhao, Shanrong [JRDUS]

> *Sent:* Thursday, March 13, 2014 9:40 PM

> *To:* 'hpages at fhcrc.org'

> *Subject:* overlap regions between two GRanges (or GRangesList)

>

> Dear Dr. Pages,

>

> I am exploring Bioconductor packages to analyze DNA methylation data.

> One question I don't know how to solve it. I have two GRanges (OR

> GRangesList),  Now I want to identify the common (*overlapping)

> regions*, not just overlapping or not--- *gc <- overlapRegions(ga,gb)*

>

> The other question I have: I am interested in all *cytosines* in

> promoters regions, I have already had promoter in GRange object. What is

> the most efficient way to identify the total number of Cs?   I plan to

> extract all DNA sequences corresponding to promoters, and then call

> letterFrequency (by set letters="C").

>

> Best regards,

>

> Shanrong

>



--

Hervé Pagès



Program in Computational Biology

Division of Public Health Sciences

Fred Hutchinson Cancer Research Center

1100 Fairview Ave. N, M1-B514

P.O. Box 19024

Seattle, WA 98109-1024



E-mail: hpages at fhcrc.org<mailto:hpages at fhcrc.org>

Phone:  (206) 667-5791

Fax:    (206) 667-1319



        [[alternative HTML version deleted]]




More information about the Bioconductor mailing list