Sorry it took so long but I just checked this in for version 1.7.8

On Tue, Jun 8, 2010 at 8:27 PM, Michael Lawrence <michafla@gene.com> wrote:

>
>
> On Tue, Jun 8, 2010 at 8:00 PM, Kasper Daniel Hansen <
> kasperdanielhansen@gmail.com> wrote:
>
>> On Tue, Jun 8, 2010 at 9:53 PM, Martin Morgan <mtmorgan@fhcrc.org> wrote:
>> > On 06/09/2010 02:36 AM, Kasper Daniel Hansen wrote:
>> >> Hi Michael
>> >>
>> >> Michael,
>> >>
>> >> Thanks for pointing out nearest and friends; I agree that this
>> >> function should address my question.
>> >>
>> >> Reading the man page for nearest function, might I suggest an
>> >> additional argument like
>> >>   multihits = c("arbitrary", "all")
>> >
>> > yes thanks for the timely nearest hint; for me I was hoping that ties
>> > could be decided based on maximum overlap (though ties might still
>> > occur, and perhaps I could break ties myself if 'all' were returned
>> > without too much difficulty).
>>
>> I would strongly advocate just returning the multiple hits and let the
>> user decide how to break ties.  While I certainly see use cases where
>> Martin's suggestion is reasonable, what happens if a range overlap two
>> different ranges, which have very different width.  Should "maximum
>> overlap" be counted in number of integers (in which case you might be
>> biased towards the longer of the two ranges) or should it be in
>> %-length or ?  I think allowing for all these possibilities would make
>> for a very complicated interface with little gain.
>>
>> One might conceive of a filter function on top of the return object
>> (which could work for findOverlaps as well), but as the filtering in
>> some cases would depend on additional metadata (say the ranges are of
>> different type and there is an ordering on the type so an overlap with
>> a certain type trumps another type), it might be a lot of
>> infrastructure with very little gain.  And really, the filtering would
>> just be an apply over the sparse matrix.
>>
>>
> I agree that returning "all" is a good option to have, and it is consistent
> with the functionality of findOverlaps. If compelling common use cases
> emerge for tie resolution, the number of multihits options can be expanded.
>
> Thanks for the suggestion,
> Michael
>
>
>> Kasper
>>
>>
>> >
>> > Martin
>> >
>> >> with the intention that a user can get full information in case one
>> >> range overlaps (or ties in distance) with multiple other ranges.  The
>> >> return value could be a sparse matrix, findOverlaps-like.  I find it
>> >> important to know about multiple hits, especially in the case when a
>> >> range has multiple overlaps.
>> >>
>> >> Thanks,
>> >> Kasper
>> >>
>> >> On Tue, Jun 8, 2010 at 1:25 PM, Michael Lawrence
>> >> <lawrence.michael@gene.com> wrote:
>> >>> For all pairwise distances, something simple based on outer() should
>> >>> suffice. It might not be very space efficient, but speed should be
>> somewhat
>> >>> close to optimal.
>> >>>
>> >>> What is the end goal of this? For example, the nearest() function
>> finds
>> >>> nearest neighbors efficiently.
>> >>>
>> >>> You might be able to leverage findOverlaps(). For example, one can set
>> the
>> >>> maximum gap between ranges to be considered overlapping. That could be
>> set
>> >>> to a non-zero value representing some maximum allowable distance. The
>> sparse
>> >>> doublet matrix from as.matrix() would be pretty efficient for distance
>> >>> calculation, via the pgap() function.
>> >>>
>> >>> Michael
>> >>>
>> >>> On Tue, Jun 8, 2010 at 8:51 AM, Kasper Daniel Hansen
>> >>> <kasperdanielhansen@gmail.com> wrote:
>> >>>>
>> >>>> Assuming I have two IRanges, each with multiple ranges, like
>> >>>>  ir1 = IRanges(start = 3:6, width = 2)
>> >>>>  ir2 = IRanges(start = 10:17, width = 2)
>> >>>>
>> >>>> Is there a fast way to compute a pairwise distance matrix between the
>> >>>> two sets, by which I mean
>> >>>>  ii = 1
>> >>>>  jj = 2
>> >>>>  width(gaps(c(ir1[ii], ir2[jj])))
>> >>>> where ii, jj would index into a result matrix.  Essentially this
>> would
>> >>>> be an expanded version of findOverlaps, since any two ranges with
>> >>>> distance = 0, have an overlap.
>> >>>>
>> >>>> Is such functionality available in IRanges, in an efficient
>> >>>> implementation (think of the case where the two IRanges have - say -
>> >>>> 10,000 ranges or more)?
>> >>>>
>> >>>> Kasper
>> >>>>
>> >>>> _______________________________________________
>> >>>> Bioconductor mailing list
>> >>>> Bioconductor@stat.math.ethz.ch
>> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>>> Search the archives:
>> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>
>> >>>
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor@stat.math.ethz.ch
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>> > --
>> > Martin Morgan
>> > Computational Biology / Fred Hutchinson Cancer Research Center
>> > 1100 Fairview Ave. N.
>> > PO Box 19024 Seattle, WA 98109
>> >
>> > Location: Arnold Building M1 B861
>> > Phone: (206) 667-2793
>> >
>>
>
>

	[[alternative HTML version deleted]]