[BioC] findOverlaps method in GenomicRanges not supporting type="equal" for GRangesList, GRangesList?
Hervé Pagès
hpages at fhcrc.org
Fri Nov 22 23:13:04 CET 2013
Hi Michael,
On 11/21/2013 12:59 PM, Michael Lawrence wrote:
> Ok. I think my code is broken anyway, in cases where ranges are repeated
> in one of the GRanges. Feel free to use some of it or delete it. As for
> the zero width ranges, I'm guessing people are usually looking for
> match()-like behavior, rather than findOverlaps() behavior, when
> type="equals", so we might need another interface?
My preference would be to keep the findOverlaps interface with a note
in the man page for findOverlaps,GRangesList,GRangesList about special
treatment of zero-width ranges.
> Also, I'm guessing
> that the hash-based match() is a lot faster than the interval-tree
> approach, so we might want to use that, except perhaps in the circular
> sequence case.
Yes we should probably reuse the hash-based match() internally to
implement findOverlaps(type="equal"). Ranges on circular sequences
just need to be shifted by a multiple of the sequence length before
match() is called so their start is >= 1 and <= sequence length.
Cheers,
H.
>
> Michael
>
>
>
>
> On Thu, Nov 21, 2013 at 12:02 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Michael, Nico,
>
> Right now match/== methods for List objects behave inconsistently.
> For example, even for conceptually close objects like IntegerList
> and XIntegerViews, we have:
>
> x <- IntegerList(a=1:5, b=2:-3, c=1:3)
> v <- successiveViews(unlist(x), elementLengths(x))
>
> > x == rev(x)
> LogicalList of length 3
> [["a"]] TRUE TRUE TRUE FALSE FALSE
> [["b"]] TRUE TRUE TRUE TRUE TRUE TRUE
> [["c"]] TRUE TRUE TRUE FALSE FALSE
>
> > v == rev(v)
> [1] FALSE TRUE FALSE
>
> > match(x, rev(x))
> IntegerList of length 3
> [["a"]] 1 2 3 <NA> <NA>
> [["b"]] 1 2 3 4 5 6
> [["c"]] 1 2 3
>
> > match(v, rev(v))
> Error in base::match(x, table, nomatch = nomatch, incomparables =
> incomparables, :
> 'match' requires vector arguments
>
> This is not a good situation and there is still some work that needs to
> be done at some point in the future to clean-up the match/== methods in
> IRanges/GenomicRanges. In the mean time I think we should hold on
> adding new methods for List objects until there is a clear consensus on
> how they should behave.
>
> As for Nico's request, I agree that the best way to go would be to just
> make findOverlaps(type="equal") work. There are some subtle semantic
> differences between a *match* (as reported by match or ==), and equality
> from a range overlap point of view. The former can report equality
> for ranges on a circular sequence that are not considered equal for
> the latter. Another difference is how zero-width ranges are handled.
>
> Thanks,
> H.
>
>
>
> On 11/21/2013 10:43 AM, Michael Lawrence wrote:
>
> So I've checked into devel a match,GRangesList,GRangesList. This
> allows
> findMatches() to return what you want. There is a question
> though before
> this is approved: does it make sense for match() to act like
> findOverlaps
> and consider each GRanges atomically (one returned index per
> GRanges) or
> should match behave as it does other Lists and return an
> IntegerList, with
> a value per range, grouped by the top-level elements. If we
> decide on the
> latter, then the method I wrote needs to be removed and the
> implementation
> moved to the "equals" mode in findOverlaps. Either way,
> findOverlaps(type="equals") should be made to work.
>
> Michael
>
>
> On Thu, Nov 21, 2013 at 8:13 AM, Nicolas Delhomme
> <nicolas.delhomme at umu.se <mailto:nicolas.delhomme at umu.se>>__wrote:
>
> Thanks!
> ------------------------------__------------------------------__---
> Nicolas Delhomme
>
> Nathaniel Street Lab
> Department of Plant Physiology
> Umeå Plant Science Center
>
> Tel: +46 90 786 7989 <tel:%2B46%2090%20786%207989>
> Email: nicolas.delhomme at plantphys.__umu.se
> <mailto:nicolas.delhomme at plantphys.umu.se>
> SLU - Umeå universitet
> Umeå S-901 87 Sweden
> ------------------------------__------------------------------__---
>
> On 21 Nov 2013, at 17:06, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
> wrote:
>
> I will work on this today.
>
> Michael
>
>
> On Thu, Nov 21, 2013 at 4:43 AM, Nicolas Delhomme <
>
> nicolas.delhomme at umu.se <mailto:nicolas.delhomme at umu.se>> wrote:
>
> Hej Bioc!
>
> When I try to find “equal” ranges from two GRangesList
> object, I get the
>
> following error:
>
>
> findOverlaps(query=grng.def,__subject=grng.mod,type="equal")
>
> Error in match.arg(type) :
> 'arg' should be one of “any”, “start”, “end”, “within”
>
> Isn’t type=“equal” supported for the GRangesList,
> GRangesList signature?
>
> Cheers,
>
> Nico
>
> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin13.0.0 (64-bit)
>
> locale:
> [1]
> en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils
> datasets methods
>
> base
>
>
> other attached packages:
> [1] easyRNASeq_1.8.2 ShortRead_1.20.0
> Rsamtools_1.14.1
>
> GenomicRanges_1.14.3 DESeq_1.14.0
> lattice_0.20-24
> locfit_1.5-9.1
>
> [8] Biostrings_2.30.1 XVector_0.2.0
> IRanges_1.20.5
>
> edgeR_3.4.0 limma_3.18.3 biomaRt_2.18.0
> Biobase_2.22.0
>
> [15] genomeIntervals_1.18.0 BiocGenerics_0.8.0
> intervals_0.14.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.40.0 AnnotationDbi_1.24.0
> bitops_1.0-6
>
> DBI_0.2-7 genefilter_1.44.0 geneplotter_1.40.0
> grid_3.0.2
> hwriter_1.3
>
> [9] latticeExtra_0.6-26 LSD_2.5
> RColorBrewer_1.0-5
>
> RCurl_1.95-4.1 RSQLite_0.11.4 splines_3.0.2
> stats4_3.0.2
> survival_2.37-4
>
> [17] tools_3.0.2 XML_3.98-1.1 xtable_1.7-1
>
> zlibbioc_1.8.0
>
>
>
> ------------------------------__------------------------------__---
> Nicolas Delhomme
>
> Nathaniel Street Lab
> Department of Plant Physiology
> Umeå Plant Science Center
>
> Tel: +46 90 786 7989 <tel:%2B46%2090%20786%207989>
> Email: nicolas.delhomme at plantphys.__umu.se
> <mailto:nicolas.delhomme at plantphys.umu.se>
> SLU - Umeå universitet
> Umeå S-901 87 Sweden
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
>
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
>
> _________________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>
> Search the archives:
> http://news.gmane.org/gmane.__science.biology.informatics.__conductor
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list