[BioC] distances for IRanges

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed Jun 9 02:36:43 CEST 2010


Hi Michael

Michael,

Thanks for pointing out nearest and friends; I agree that this
function should address my question.

Reading the man page for nearest function, might I suggest an
additional argument like
  multihits = c("arbitrary", "all")
with the intention that a user can get full information in case one
range overlaps (or ties in distance) with multiple other ranges.  The
return value could be a sparse matrix, findOverlaps-like.  I find it
important to know about multiple hits, especially in the case when a
range has multiple overlaps.

Thanks,
Kasper

On Tue, Jun 8, 2010 at 1:25 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> For all pairwise distances, something simple based on outer() should
> suffice. It might not be very space efficient, but speed should be somewhat
> close to optimal.
>
> What is the end goal of this? For example, the nearest() function finds
> nearest neighbors efficiently.
>
> You might be able to leverage findOverlaps(). For example, one can set the
> maximum gap between ranges to be considered overlapping. That could be set
> to a non-zero value representing some maximum allowable distance. The sparse
> doublet matrix from as.matrix() would be pretty efficient for distance
> calculation, via the pgap() function.
>
> Michael
>
> On Tue, Jun 8, 2010 at 8:51 AM, Kasper Daniel Hansen
> <kasperdanielhansen at gmail.com> wrote:
>>
>> Assuming I have two IRanges, each with multiple ranges, like
>>  ir1 = IRanges(start = 3:6, width = 2)
>>  ir2 = IRanges(start = 10:17, width = 2)
>>
>> Is there a fast way to compute a pairwise distance matrix between the
>> two sets, by which I mean
>>  ii = 1
>>  jj = 2
>>  width(gaps(c(ir1[ii], ir2[jj])))
>> where ii, jj would index into a result matrix.  Essentially this would
>> be an expanded version of findOverlaps, since any two ranges with
>> distance = 0, have an overlap.
>>
>> Is such functionality available in IRanges, in an efficient
>> implementation (think of the case where the two IRanges have - say -
>> 10,000 ranges or more)?
>>
>> Kasper
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list