Hmm, that's odd. And it seems inconsistent with the usual distance() and
distanceToNearest() methods, which would return 0 in this case, yes?
e.g. if the IRanges were GRanges and sitting on top of each other on the
same strand, or if the first was distance()'ed against itself (the same
thing)
That doesn't make sense to me either. This doesn't seem like an issue of
theology to me, but rather inconsistency with what the docs say will happen.
On Wed, Oct 17, 2012 at 7:31 AM, Harris A. Jaffee wrote:
> As I said, any rule is fine, especially if they can serve a purpose
> via their "location". But you also have this, which seems weird:
>
> > distance(IRanges(2,1), IRanges(2,1))
> [1] 1
>
> To say they are not treated any differently is a big stretch to me,
> since their defining formula (end = start-1) is already a violation
> of my senses. [I wonder if SEW = (start, NA, 0) wouldn't have been
> more useful.]
>
> Just so we don't continue off into useless theology, I should say that
> I started looking into the distance code with the idea of adding a
> "signed" option (say, distance(x,y)<0 iff x is downstream from y), and
> I wanted to understand the landscape first.
>
> On Oct 17, 2012, at 12:56 AM, Hervé Pagès wrote:
>
> > Hi Harris, Val, Michael,
> >
> > Right now, all those distances are equal to 8, which I think is what
> > we want:
> >
> > distance(IRanges(1,2), IRanges(10,16))
> > distance(IRanges(10,16), IRanges(1,2))
> > distance(IRanges(1,2), IRanges(10,9))
> > distance(IRanges(10,9),IRanges(1,2))
> > distance(IRanges(3,2), IRanges(10,16))
> > distance(IRanges(10,16), IRanges(3,2))
> > distance(IRanges(3,2), IRanges(10,9))
> > distance(IRanges(10,9), IRanges(3,2))
> >
> > Yes, from a mathematical point of view, the distance between a
> > non-empty set and an empty set if not defined, but I'm not sure there
> > would be much to gain in doing this. Some use cases would be needed.
> > What's nice about the current behavior is that zero-width ranges are
> > not treated any differently than other ranges.
> >
> > H.
> >
> >
> > On 10/16/2012 12:21 PM, Harris A. Jaffee wrote:
> >> Any rule is fine with me. They would not occur for us except by
> mistake.
> >> I was basically raising an alert that the current situation is not quite
> >> complete.
> >>
> >> On Oct 16, 2012, at 2:53 PM, Michael Lawrence wrote:
> >>
> >>> Since these zero-width ranges do have a position, I'm not sure why we
> cannot calculate their distance. As long as we have an established rule
> about how we handle them. For example, is it just from the start? That's
> probably most intuitive.
> >>>
> >>> Michael
> >>>
> >>> On Tue, Oct 16, 2012 at 11:25 AM, Valerie Obenchain <
> vobencha@fhcrc.org> wrote:
> >>> On 10/16/2012 09:25 AM, Harris A. Jaffee wrote:
> >>> Fantastic, thank you. I'm not sure I knew about that page!
> >>>
> >>> For example, ?precede gives me the IRanges nearest-methods page,
> >>> and only if I explicitly request ?'nearest-methods' can I choose
> >>> the GenomicRanges page. That brings up another issue: ?distance
> >>> again sends me to the IRanges nearest-methods help, and that only
> >>> documents distanceToNearest, which is not enough. The spec for
> >>> distance actually lives only on
> >>> ?'precede,GenomicRanges,GenomicRanges-method', as far as I see.
> >>>
> >>> I can update the IRanges man page to include distance().
> >>>
> >>> Finally, and admittedly a total edge case, but zero-width ranges
> >>> appear to be mishandled, or not handled, by the IRanges distance
> >>> code. I think they should be infinitely far from anything, or at
> >>> least NA, and there should not be any nearest or distanceToNearest
> >>> value for a zero-width query.
> >>> cc'ing Herve and Michael for history.
> >>>
> >>> In general I agree that zero-width ranges should not have a distance
> from any other range. I favor returning NA instead of inf since the range
> (albeit zero-width) does have a location.
> >>>
> >>> One concern I have is that we use zero-width ranges to represent
> insertions. In that context we may want to compute the distance to the
> nearest insertion. I do not have a concrete use case - just thinking of
> what might come. I'd be interested in opinions on this.
> >>>
> >>> Valerie
> >>>
> >>>
> >>>
> >>> x
> >>> IRanges of length 1
> >>> start end width
> >>> [1] 2 1 0
> >>>
> >>> distance(x, x)
> >>> [1] 1
> >>>
> >>> y
> >>> IRanges of length 1
> >>> start end width
> >>> [1] 3 2 0
> >>>
> >>> distance(x, y)
> >>> [1] 2
> >>>
> >>> z
> >>> IRanges of length 1
> >>> start end width
> >>> [1] 3 3 1
> >>>
> >>> distance(x, z)
> >>> [1] 2
> >>>
> >>> On Oct 15, 2012, at 6:27 PM, Valerie Obenchain wrote:
> >>>
> >>> A few follow up items from an off-list conversation with Harris.
> >>>
> >>> 1) I reversed the use of 5' and3' in my response below. Sorry about
> that.
> >>>
> >>> 2) The primary landing page for ?precede is in IRanges. I've expanded
> the \seealso section of this page to point the user to the man page in
> GenomicRanges.
> >>>
> >>> 3) The precede/follow man page in GenomicRanges now specifically
> states that 5' to 3' is the relevant orientation for precede/follow. For
> those interested in taking a look, the man page has several aliases,
> >>>
> >>> ?'nearest-methods'
> >>> ?'precede,GenomicRanges,GenomicRanges-method'
> >>> ?'follow,GenomicRanges,GenomicRanges-method'
> >>>
> >>> Changes can be found in IRanges 1.17.3 and GenomicRanges 1.11.4.
> >>>
> >>> Thanks Harris.
> >>>
> >>> Valerie
> >>>
> >>> On 10/12/2012 04:17 PM, Valerie Obenchain wrote:
> >>> Hi Harris,
> >>>
> >>> On 10/11/2012 02:27 PM, Harris A. Jaffee wrote:
> >>> Apologies in advance if I just don't understand the ignore.strand
> switch, or
> >>> perhaps GRanges objects. Also, I have not even tried to understand
> >>>
> >>> GenomicRanges:::.GenomicRanges_findPrecedeFollow
> >>>
> >>> I have the impression that a query with strand entirely "*"
> essentially implies
> >>> ignore.strand, but this case here is handled correctly only if
> ignore.strand is
> >>> TRUE (consistent with distanceToNearest but not distance, which seems
> correct):
> >>>
> >>> x = GRanges(ranges=IRanges(start=5, end=5), seqnames="chr1",
> strand="*")
> >>> Y = GRanges(ranges=IRanges(start=c(6,7), end=c(6,7)), seqnames="chr1",
> strand=c("+","-"))
> >>> distance(x, Y[1])
> >>> [1] 1
> >>> distance(x, Y[2])
> >>> [1] 2
> >>>
> >>> nearest(x, Y, ignore.strand=TRUE) # correct
> >>> [1] 1
> >>> nearest(ranges(x), ranges(Y)) # also correct
> >>> [1] 1
> >>>
> >>> However,
> >>>
> >>> nearest(x, Y)
> >>> [1] 2
> >>> Thanks for reporting this bug. Now fixed in GenomicRanges 1.11.3 and
> 1.10.2.
> >>>
> >>> The problem was in .nearest() where I computed the preceding and
> following distance of the ranges. I was missing abs() so the distance for
> the second range was -2 instead of 2. When checking to see which range was
> closest
> >>>
> >>> 1< -2
> >>> [1] FALSE
> >>>
> >>> so the second range won.
> >>> distanceToNearest(x, Y)
> >>> DataFrame with 1 row and 3 columns
> >>> queryHits subjectHits distance
> >>>
> >>> 1 1 2 2
> >>>
> >>> distanceToNearest(x, Y, ignore.strand=TRUE)
> >>> DataFrame with 1 row and 3 columns
> >>> queryHits subjectHits distance
> >>>
> >>> 1 1 1 1
> >>>
> >>>
> >>> Finally,
> >>>
> >>> follow(ranges(x), ranges(Y))
> >>> [1] NA
> >>>
> >>> The issue (along with GenomicRanges:::.nearest) must come down to this:
> >>>
> >>> follow(x, Y) # how can this be?
> >>> [1] 2
> >>> This behavior is due to the fact that a '*' range can compare itself
> to either '+' or '-'. The 'x' is '*" located at position 5. When
> comparing it to the '+' Y[1]
> >>> we think of both ranges as '+'. precede() and follow() locations are
> determined by moving from 3' to 5'. On '+' strand this is from left to
> right so 5 precedes 6.
> >>>
> >>> precede(x, Y[1])
> >>> [1] 1
> >>>
> >>> but does not follow it
> >>>
> >>> follow(x, Y[1])
> >>> [1] NA
> >>>
> >>> When comparing the '*' to Y[2] we think of both ranges as '-'. On the
> '-' moving from 3' to 5' is right to left so the 5 follows 7
> >>>
> >>> follow(x, Y[2])
> >>> [1] 1
> >>>
> >>> but does not precede it
> >>>
> >>> precede(x, Y[2])
> >>> [1] NA
> >>>
> >>> When we set ignore.strand=TRUE, all ranges are considered '+'.
> >>>
> >>> Valerie
> >>> whereas
> >>>
> >>> follow(x, Y, ignore.strand=TRUE) # correct, I think
> >>> [1] NA
> >>>
> >>> sessionInfo()
> >>> R Under development (unstable) (2012-10-10 r60908)
> >>> Platform: x86_64-unknown-linux-gnu (64-bit)
> >>>
> >>> locale:
> >>> [1] LC_CTYPE=en_US.iso885915 LC_NUMERIC=C
> >>> [3] LC_TIME=en_US.iso885915 LC_COLLATE=en_US.iso885915
> >>> [5] LC_MONETARY=en_US.iso885915 LC_MESSAGES=en_US.iso885915
> >>> [7] LC_PAPER=C LC_NAME=C
> >>> [9] LC_ADDRESS=C LC_TELEPHONE=C
> >>> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
> >>>
> >>> attached base packages:
> >>> [1] stats graphics grDevices datasets utils methods base
> >>>
> >>> other attached packages:
> >>> [1] GenomicRanges_1.11.0 IRanges_1.17.0 BiocGenerics_0.5.0
> >>>
> >>> loaded via a namespace (and not attached):
> >>> [1] parallel_2.16.0 stats4_2.16.0 tools_2.16.0
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor@r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor@r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>>
> >>>
> >>
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages@fhcrc.org
> > Phone: (206) 667-5791
> > Fax: (206) 667-1319
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
*A model is a lie that helps you see the truth.*
*
*
Howard Skipper
[[alternative HTML version deleted]]