[BioC] distances for IRanges

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 9 03:53:26 CEST 2010


On 06/09/2010 02:36 AM, Kasper Daniel Hansen wrote:
> Hi Michael
> 
> Michael,
> 
> Thanks for pointing out nearest and friends; I agree that this
> function should address my question.
> 
> Reading the man page for nearest function, might I suggest an
> additional argument like
>   multihits = c("arbitrary", "all")

yes thanks for the timely nearest hint; for me I was hoping that ties
could be decided based on maximum overlap (though ties might still
occur, and perhaps I could break ties myself if 'all' were returned
without too much difficulty).

Martin

> with the intention that a user can get full information in case one
> range overlaps (or ties in distance) with multiple other ranges.  The
> return value could be a sparse matrix, findOverlaps-like.  I find it
> important to know about multiple hits, especially in the case when a
> range has multiple overlaps.
> 
> Thanks,
> Kasper
> 
> On Tue, Jun 8, 2010 at 1:25 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> For all pairwise distances, something simple based on outer() should
>> suffice. It might not be very space efficient, but speed should be somewhat
>> close to optimal.
>>
>> What is the end goal of this? For example, the nearest() function finds
>> nearest neighbors efficiently.
>>
>> You might be able to leverage findOverlaps(). For example, one can set the
>> maximum gap between ranges to be considered overlapping. That could be set
>> to a non-zero value representing some maximum allowable distance. The sparse
>> doublet matrix from as.matrix() would be pretty efficient for distance
>> calculation, via the pgap() function.
>>
>> Michael
>>
>> On Tue, Jun 8, 2010 at 8:51 AM, Kasper Daniel Hansen
>> <kasperdanielhansen at gmail.com> wrote:
>>>
>>> Assuming I have two IRanges, each with multiple ranges, like
>>>  ir1 = IRanges(start = 3:6, width = 2)
>>>  ir2 = IRanges(start = 10:17, width = 2)
>>>
>>> Is there a fast way to compute a pairwise distance matrix between the
>>> two sets, by which I mean
>>>  ii = 1
>>>  jj = 2
>>>  width(gaps(c(ir1[ii], ir2[jj])))
>>> where ii, jj would index into a result matrix.  Essentially this would
>>> be an expanded version of findOverlaps, since any two ranges with
>>> distance = 0, have an overlap.
>>>
>>> Is such functionality available in IRanges, in an efficient
>>> implementation (think of the case where the two IRanges have - say -
>>> 10,000 ranges or more)?
>>>
>>> Kasper
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list