[BioC] distances for IRanges
mtmorgan at fhcrc.org
Wed Jun 9 03:53:26 CEST 2010
On 06/09/2010 02:36 AM, Kasper Daniel Hansen wrote:
> Hi Michael
> Thanks for pointing out nearest and friends; I agree that this
> function should address my question.
> Reading the man page for nearest function, might I suggest an
> additional argument like
> multihits = c("arbitrary", "all")
yes thanks for the timely nearest hint; for me I was hoping that ties
could be decided based on maximum overlap (though ties might still
occur, and perhaps I could break ties myself if 'all' were returned
without too much difficulty).
> with the intention that a user can get full information in case one
> range overlaps (or ties in distance) with multiple other ranges. The
> return value could be a sparse matrix, findOverlaps-like. I find it
> important to know about multiple hits, especially in the case when a
> range has multiple overlaps.
> On Tue, Jun 8, 2010 at 1:25 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> For all pairwise distances, something simple based on outer() should
>> suffice. It might not be very space efficient, but speed should be somewhat
>> close to optimal.
>> What is the end goal of this? For example, the nearest() function finds
>> nearest neighbors efficiently.
>> You might be able to leverage findOverlaps(). For example, one can set the
>> maximum gap between ranges to be considered overlapping. That could be set
>> to a non-zero value representing some maximum allowable distance. The sparse
>> doublet matrix from as.matrix() would be pretty efficient for distance
>> calculation, via the pgap() function.
>> On Tue, Jun 8, 2010 at 8:51 AM, Kasper Daniel Hansen
>> <kasperdanielhansen at gmail.com> wrote:
>>> Assuming I have two IRanges, each with multiple ranges, like
>>> ir1 = IRanges(start = 3:6, width = 2)
>>> ir2 = IRanges(start = 10:17, width = 2)
>>> Is there a fast way to compute a pairwise distance matrix between the
>>> two sets, by which I mean
>>> ii = 1
>>> jj = 2
>>> width(gaps(c(ir1[ii], ir2[jj])))
>>> where ii, jj would index into a result matrix. Essentially this would
>>> be an expanded version of findOverlaps, since any two ranges with
>>> distance = 0, have an overlap.
>>> Is such functionality available in IRanges, in an efficient
>>> implementation (think of the case where the two IRanges have - say -
>>> 10,000 ranges or more)?
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> Search the archives:
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor