[BioC] countMatches() (was: table for GenomicRanges)
Cook, Malcolm
MEC at stowers.org
Fri Jan 4 22:56:27 CET 2013
Hiya,
For what it is worth...
I think the change to %in% is warranted.
If I understand correctly, this change restores the relationship between the semantics of `%in` and the semantics of `match`.
From the docs:
'"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0'
Herve's change restores this relationship.
Herve, I suspect you were you as a result able to completely drop all the `%in%,BiocClass1,BiocClass2` definitions and depend upon base::%in%
Am I right?
If so, may I suggest that Herve stay the course, with the addition of
'"%ol%" <- function(a, b) findOverlaps(a, b, maxgap=0L, minoverlap=1L, type='any', select='all') > 0'
This would provide a perspicacious idiom, thereby optimizing the API for Michaels observed common use case.
Just sayin'
~Malcolm
.-----Original Message-----
.From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Sean Davis
.Sent: Friday, January 04, 2013 3:37 PM
.To: Michael Lawrence
.Cc: Tim Triche, Jr.; Vedran Franke; bioconductor at r-project.org
.Subject: Re: [BioC] countMatches() (was: table for GenomicRanges)
.
.On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
.<lawrence.michael at gene.com> wrote:
.> The change to the behavior of %in% is a pretty big one. Are you thinking
.> that all set-based operations should behave this way? For example, setdiff
.> and intersect? I really liked the syntax of "peaks %in% genes". In my
.> experience, it's way more common to ask questions about overlap than about
.> equality, so I'd rather optimize the API for that use case. But again,
.> that's just my personal bias.
.
.For what it is worth, I share Michael's personal bias here.
.
.Sean
.
.
.> Michael
.>
.>
.> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages at fhcrc.org> wrote:
.>
.>> Hi,
.>>
.>> I added findMatches() and countMatches() to the latest IRanges /
.>> GenomicRanges packages (in BioC devel only).
.>>
.>> findMatches(x, table): An enhanced version of ‘match’ that
.>> returns all the matches in a Hits object.
.>>
.>> countMatches(x, table): Returns an integer vector of the length
.>> of ‘x’, containing the number of matches in ‘table’ for
.>> each element in ‘x’.
.>>
.>> countMatches() is what you can use to tally/count/tabulate (choose your
.>> preferred term) the unique elements in a GRanges object:
.>>
.>> library(GenomicRanges)
.>> set.seed(33)
.>> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE), width=5))
.>>
.>> Then:
.>>
.>> > gr_levels <- sort(unique(gr))
.>> > countMatches(gr_levels, gr)
.>> [1] 1 1 1 2 4 2 2 1 2 2 2
.>>
.>> Note that findMatches() and countMatches() also work on IRanges and
.>> DNAStringSet objects, as well as on ordinary atomic vectors:
.>>
.>> library(hgu95av2probe)
.>> library(Biostrings)
.>> probes <- DNAStringSet(hgu95av2probe)
.>> unique_probes <- unique(probes)
.>> count <- countMatches(unique_probes, probes)
.>> max(count) # 7
.>>
.>> I made other changes in IRanges/GenomicRanges so that the notion
.>> of "match" between elements of a vector-like object now consistently
.>> means "equality" instead of "overlap", even for range-based objects
.>> like IRanges or GRanges objects. This notion of "equality" is the
.>> same that is used by ==. The most visible consequence of those
.>> changes is that using %in% between 2 IRanges or GRanges objects
.>> 'query' and 'subject' in order to do overlaps was replaced by
.>> overlapsAny(query, subject).
.>>
.>> overlapsAny(query, subject): Finds the ranges in ‘query’ that
.>> overlap any of the ranges in ‘subject’.
.>>
.>> There are warnings and deprecation messages in place to help smooth
.>> the transition.
.>>
.>> Cheers,
.>> H.
.>>
.>> --
.>> Hervé Pagès
.>>
.>> Program in Computational Biology
.>> Division of Public Health Sciences
.>> Fred Hutchinson Cancer Research Center
.>> 1100 Fairview Ave. N, M1-B514
.>> P.O. Box 19024
.>> Seattle, WA 98109-1024
.>>
.>> E-mail: hpages at fhcrc.org
.>> Phone: (206) 667-5791
.>> Fax: (206) 667-1319
.>>
.>
.> [[alternative HTML version deleted]]
.>
.>
.> _______________________________________________
.> Bioconductor mailing list
.> Bioconductor at r-project.org
.> https://stat.ethz.ch/mailman/listinfo/bioconductor
.> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
.
._______________________________________________
.Bioconductor mailing list
.Bioconductor at r-project.org
.https://stat.ethz.ch/mailman/listinfo/bioconductor
.Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list