[Bioc-devel] distanceToNearest Self Query

Valerie Obenchain vobencha at fhcrc.org
Wed Sep 5 18:14:57 CEST 2012


Hi Dario,

Thanks for catching this. Now fixed in GenomicRanges 1.9.61.

It was already the case for nearest,GenomicRanges,GenomicRanges that 
self-hits were found when both 'x' and 'subject' were present but not 
when only 'x' was provided. The problem in distanceToNearest was that 
when 'subject' was missing it was set equal to 'x' before calling 
nearest. This resulted in self-hits being found regardless of how the 
method was called.  distanctToNearst now behaves appropriately,.


gr <- GRanges("1", IRanges(c(1, 10, 1000), width=2), "+")
 > gr
GRanges with 3 ranges and 0 metadata columns:
       seqnames       ranges strand
<Rle> <IRanges> <Rle>
   [1]        1 [   1,    2]      +
   [2]        1 [  10,   11]      +
   [3]        1 [1000, 1001]      +
   ---
   seqlengths:
     1
    NA

When only 'x' is provided no self-hits are found,

 > distanceToNearest(gr)
DataFrame with 3 rows and 3 columns
   queryHits subjectHits  distance
<integer> <integer> <integer>
1         1           2         8
2         2           1         8
3         3           2       989

When both 'x' and 'subject' are provided self-hits are found,

 > distanceToNearest(gr, gr)
DataFrame with 3 rows and 3 columns
   queryHits subjectHits  distance
<integer> <integer> <integer>
1         1           1         0
2         2           2         0
3         3           3         0


Valerie

On 09/04/2012 11:00 PM, Dario Strbenac wrote:
> Hello,
>
> In findOverlaps, there is an argument ignoreSelf, that means overlaps aren't found for the same range when only the query object is provided. I was hoping I could use something similar for the distanceToNearest function in GenomicRanges, with only the query provided. Would this be a worthwhile addition ?
>
> Example :
>
>> g1
> GRanges with 3 ranges and 0 metadata columns:
>        seqnames       ranges strand
>           <Rle>     <IRanges>   <Rle>
>    [1]        1 [   1,    2]      +
>    [2]        1 [  10,   11]      +
>    [3]        1 [1000, 1010]      +
>    ---
>    seqlengths:
>      1
>     NA
>> distanceToNearest(g1, ignoreSelf = TRUE) # Not implemented
> DataFrame with 3 rows and 3 columns
>    queryHits subjectHits  distance
>    <integer>    <integer>  <integer>
> 1         1           1         0
> 2         2           2         0
> 3         3           3         0
>> findOverlaps(g1, ignoreSelf = TRUE) # Implemented
> Hits of length 0
> queryLength: 3
> subjectLength: 3
>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list