Thanks Martin, this sound like a useful approach. I will try to implement
it.
Malcolm -  I am not sure that I understood you're comments.
I do not "want explicitly to exclude overlaps with my cgs". Rather, given
that my particular problem invloves intersecting single nuclotide ranges,
 I do not think that the problem of overlaps is likely to arise.  Am I
missing something here?

Thanks agian for all of your for being so helpful

Dolev


On Thu, Aug 22, 2013 at 7:57 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote:

> On 08/21/2013 01:46 PM, Cook, Malcolm wrote:
>
>> Well, there are many ways to skin this kind of cat (apologies to the
>> internet).
>>
>> Indeed, there are many cats that are slight variations on this cat.
>>
>> re: "since I'm looking for regulatory elements I am not interested in
>> this type of associations anyhow"
>>
>> I think that you should probably be handling this else-how.  If what you
>> mean is that there should be some minimal distance between your cgs and
>> your genes, then, you should probably explicitly make this a requirement,
>> and not add it as an afterthought when faced with the possibilities.
>>
>> That said, if you are wed to the exact to the exact results you
>> specified, and you want explicitly to exclude overlaps with your cgs, I
>> think you will find that you will want to compose something along these
>> lines:
>>
>>
>> 1)      use precede and follow each twice to generate your two candidate
>> neighbors
>>
>
> Not sure if this is what Malcolm meant, or if I'm being a little
> simplistic, but maybe with
>
>     subj = GRanges("A", IRanges(seq(10, 50, 10), width=1), "+")
>     query = GRanges("A", IRanges(seq(31, 51, 10), width=1), "+")
>
> The first range that query follows is
>
>     f1 = follow(query, subj)
>
> and the second  is
>
>     f2 = follow(subj[f1], subj)
>
> giving the indexes into subj as
>
>     > f1
>     [1] 3 4 5
>     > f2
>     [1] 2 3 4
>
> I guess there are problems when there is no following range (so need to
> check for NA's before doing the second follow?
>
> Another strategy might simply
>
>     sort(subj)
>
> and then f2 = f1 - 1 (except when seqnames(subj)[f1] !=
> seqnames(subj)[f2]). Probably strand will confusing things!
>
> Martin
>
>
>> 2)      use the range or these candidates and search using findOverlaps
>> within them to see if there are others that were skipped
>>
>> Cheers,
>>
>> Malcolm
>>
>> From: d r [mailto:dolevrahat@gmail.com]
>> Sent: Wednesday, August 21, 2013 2:57 PM
>> To: Cook, Malcolm
>> Cc: Steve Lianoglou; bioconductor@r-project.org list
>> Subject: Re: [BioC] Using precede()/follow() to find two ranges
>>
>> Hi Malcolm
>> Thanks for your thoughtfulness.
>> For the specific problem I'm working on I intend to consider only the
>> TSSs of the genes, which I will compare against illumina 450k probes, which
>> are also single nucleotide ranges.
>> Therefore, I do not think that overlapping ranges should be an issue
>> here. Of course, it is theoretically possible that a probe may overlap a
>> TSS, but since I'm looking for regulatory elements I am not interested in
>> this type of associations anyhow.
>> So, any insight you may be able to offer on this problem will be highly
>> appreciated.
>>
>> However, I think that for my particular problem overlapping ranges should
>> not be an issue.
>>
>> On Wed, Aug 21, 2013 at 8:59 PM, Cook, Malcolm <MEC@stowers.org<mailto:
>> MEC@**stowers.org <MEC@stowers.org>>> wrote:
>> Dolev,
>>
>> Before chiming in on the problem as it is currently framed, I want to
>> make sure of something.
>>
>> The documentation for precede and follow read 'Overlapping ranges are
>> excluded.'.
>>
>> For instance if one of your cg ranges were to overlap one of your gene
>> ranges, any method based on precede/follow will not discover this
>> association.
>>
>> Is this really desirable for your application?
>>
>> -Malcolm
>>
>>
>>   >-----Original Message-----
>>   >From: bioconductor-bounces@r-**project.org<bioconductor-bounces@r-project.org>
>> <mailto:bioconducto**r-bounces@r-project.org<bioconductor-bounces@r-project.org>>
>> [mailto:bioconductor-bounces@**r-project.org<bioconductor-bounces@r-project.org>
>> <mailto:bioconduc**tor-bounces@r-project.org<bioconductor-bounces@r-project.org>>]
>> On Behalf Of d r
>>   >Sent: Wednesday, August 21, 2013 12:15 PM
>>   >To: Steve Lianoglou
>>   >Cc: bioconductor@r-project.org<**mailto:bioconductor@r-project.**org<bioconductor@r-project.org>>
>> list
>>   >Subject: Re: [BioC] Using precede()/follow() to find two ranges
>>   >
>>   >Hi Steve and all
>>   >
>>   >Thanks for your suggestion. I was in fact thinking in this direction.
>>   >However, by completly discarding the first set of hits before querying
>> the
>>   >second time I run the risk of missing ranges that may be potential
>> second
>>   >hits beacuse they were discarded after being a first hit for other
>> ranges.
>>   >For examle, if I have these two GRanges:
>>   >
>>   >gr1:
>>   >
>>   >GRanges with 2 ranges and 1 metadata column:
>>   >
>>   >       seqnames                 ranges strand |      names
>>   >
>>   >          <Rle>              <IRanges>  <Rle> |   <factor>
>>   >
>>   >   [1]     chr6 [125284212, 125284212]      * | cg00991794
>>   >
>>   >   [2]     chr6 [150465049, 150465049]      * | cg02250071
>>   >
>>   >
>>   >
>>   >
>>   >
>>   >gr2:
>>   >
>>   >GRanges with 4 ranges and 1 metadata column:
>>   >
>>   >       seqnames                 ranges strand |      gene
>>   >
>>   >          <Rle>              <IRanges>  <Rle> |  <factor>
>>   >
>>   >   [1]     chr6 [126284212, 124284212]      + |      PGM3
>>   >
>>   >   [2]     chr6 [160920998, 150920998]      + |   PLEKHG1
>>   >
>>   >   [3]     chr6 [ 83903012,  83903012]      + |      PGM3
>>   >
>>   >   [4]     chr6 [190102159, 170102159]       + |    WDR27
>>   >
>>   >The first hits will be gr2[1] and gr2[2], which will both be discarded.
>>   >
>>   >Now if I call precede() again the hits I will get will be gr2[3] and
>>   >gr[4], instead of getting again gr2[1] for gr1[2] and gr2[2] for
>>   >gr1[1].
>>   >
>>   >
>>   >I was thinking that applying a function that will do what Steve
>>   >suggested might do the trick if it can run on one range of gr1 at a
>>   >time without modifying gr2
>>   >
>>   >something along the lines of:
>>   >
>>   >two_hits_apply<-function(gr1,**gr2)
>>   >{
>>   >
>>   >p1<-precede(gr1,gr2)
>>   >
>>   >gr2.less<-gr2[-p1]
>>   >
>>   >p2<-precede(gr1,gr2.less)
>>   >
>>   >hits<-c(p1,p2)
>>   >
>>   >hits
>>   >
>>   >}
>>   >
>>   >now all I need is a way to apply a functuon over two GRanegs object.
>>   >If I get it right, I will need somehting like mapply(), but that can
>>   >actuaaly work on GRanges.
>>   >
>>   >Is such a function exists, or alternativly, is there a way to do this
>>   >with the convential apply functions that I miss?
>>   >
>>   >Many thanks in advance
>>   >Dolev
>>   >
>>   >
>>   >
>>   >
>>   >
>>   >
>>   >
>>   >On Wed, Aug 21, 2013 at 7:10 PM, Steve Lianoglou
>>   ><lianoglou.steve@gene.com<**mailto:lianoglou.steve@gene.**com<lianoglou.steve@gene.com>
>> >>wrote:
>>   >
>>   >> Hi,
>>
>>   >>
>>   >> On Wed, Aug 21, 2013 at 7:25 AM, d r <dolevrahat@gmail.com<mailto:d*
>> *olevrahat@gmail.com <dolevrahat@gmail.com>>> wrote:
>>   >> > Hello
>>   >> >
>>   >> > I am looking for a way to find the two preceiding/folliwng ranges
>> in one
>>   >> > GRanges object to each range in a second GRanges object.
>>   >> >
>>   >> > In other words, I am looking for a variation on precede() or
>> follow()
>>   >> that
>>   >> > will return for each range in x two ranges in subject instead of
>> one: the
>>   >> > nearest preceidng(following) range and the second nearest preceding
>>   >> > (following) range.
>>   >> >
>>   >> > Is there a way to do this? (sorry for not giving any more
>> constructive
>>   >> > suggestions, I know relativily  little on how GRanges works any
>> therefore
>>   >> > am at a loss as to how to proceed)
>>   >>
>>   >> A first draft way of doing that would be to simply use the results
>>   >> from the first precede to remove those elements from the second call?
>>   >>
>>   >> For instance, if you have two ranges you are querying, gr1 and gr2:
>>   >>
>>   >> To get the immediately preceding ranges:
>>   >>
>>   >> R> p1 <- precede(gr1, gr2)
>>   >>
>>   >> Then to get the ones immediately after that:
>>   >>
>>   >> R> gr2.less <- gr2[-p1]
>>   >> R> p2 <- precede(gr1, gr2.less)
>>   >>
>>   >> Then you can see who is who with `gr2[p1]` and the who-who is who
>> with
>>   >> `gr2.less[p2]` ... that should get you pretty close -- will likely
>>   >> have to handle edge cases, for instance when there are no preceding
>>   >> ranges, I think you get an NA (if I recall) so think about what you
>>   >> want to do with those.
>>   >>
>>   >> -steve
>>   >>
>>   >> --
>>   >> Steve Lianoglou
>>   >> Computational Biologist
>>   >> Bioinformatics and Computational Biology
>>   >> Genentech
>>   >>
>>   >
>>   >      [[alternative HTML version deleted]]
>>   >
>>   >_____________________________**__________________
>>   >Bioconductor mailing list
>>   >Bioconductor@r-project.org<**mailto:Bioconductor@r-project.**org<Bioconductor@r-project.org>
>> >
>>
>>   >https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>   >Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor@r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>

	[[alternative HTML version deleted]]