[BioC] countMatches() (was: table for GenomicRanges)
Sean Davis
sdavis2 at mail.nih.gov
Fri Jan 4 22:37:05 CET 2013
On Fri, Jan 4, 2013 at 4:32 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> The change to the behavior of %in% is a pretty big one. Are you thinking
> that all set-based operations should behave this way? For example, setdiff
> and intersect? I really liked the syntax of "peaks %in% genes". In my
> experience, it's way more common to ask questions about overlap than about
> equality, so I'd rather optimize the API for that use case. But again,
> that's just my personal bias.
For what it is worth, I share Michael's personal bias here.
Sean
> Michael
>
>
> On Fri, Jan 4, 2013 at 1:11 PM, Hervé Pagès <hpages at fhcrc.org> wrote:
>
>> Hi,
>>
>> I added findMatches() and countMatches() to the latest IRanges /
>> GenomicRanges packages (in BioC devel only).
>>
>> findMatches(x, table): An enhanced version of ‘match’ that
>> returns all the matches in a Hits object.
>>
>> countMatches(x, table): Returns an integer vector of the length
>> of ‘x’, containing the number of matches in ‘table’ for
>> each element in ‘x’.
>>
>> countMatches() is what you can use to tally/count/tabulate (choose your
>> preferred term) the unique elements in a GRanges object:
>>
>> library(GenomicRanges)
>> set.seed(33)
>> gr <- GRanges("chr1", IRanges(sample(15,20,replace=**TRUE), width=5))
>>
>> Then:
>>
>> > gr_levels <- sort(unique(gr))
>> > countMatches(gr_levels, gr)
>> [1] 1 1 1 2 4 2 2 1 2 2 2
>>
>> Note that findMatches() and countMatches() also work on IRanges and
>> DNAStringSet objects, as well as on ordinary atomic vectors:
>>
>> library(hgu95av2probe)
>> library(Biostrings)
>> probes <- DNAStringSet(hgu95av2probe)
>> unique_probes <- unique(probes)
>> count <- countMatches(unique_probes, probes)
>> max(count) # 7
>>
>> I made other changes in IRanges/GenomicRanges so that the notion
>> of "match" between elements of a vector-like object now consistently
>> means "equality" instead of "overlap", even for range-based objects
>> like IRanges or GRanges objects. This notion of "equality" is the
>> same that is used by ==. The most visible consequence of those
>> changes is that using %in% between 2 IRanges or GRanges objects
>> 'query' and 'subject' in order to do overlaps was replaced by
>> overlapsAny(query, subject).
>>
>> overlapsAny(query, subject): Finds the ranges in ‘query’ that
>> overlap any of the ranges in ‘subject’.
>>
>> There are warnings and deprecation messages in place to help smooth
>> the transition.
>>
>> Cheers,
>> H.
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list