[BioC] countMatches() (was: table for GenomicRanges)

Hervé Pagès hpages at fhcrc.org
Wed Jan 9 01:14:37 CET 2013


On 01/08/2013 02:59 PM, Cook, Malcolm wrote:
> If we’re voting/brainstorming, I’d go for one operator for value that
> the ‘type’ arg of overlap can take on
>
> Thus:
>
> %olStart%
>
> %olEnd%
>
> %olWithin%
>
> %olAny% (perhaps with alias of just ‘%ol%’)
>
> %olEqual% (which should be same as %in%, right)

Except for zero-width ranges: they never overlap with anything, but
2 zero-width ranges with the same start are considered equal:

   > ir <- IRanges(start=5:7, width=0:2)
   > ir
   IRanges of length 3
       start end width
   [1]     5   4     0
   [2]     6   6     1
   [3]     7   8     2
   > overlapsAny(ir, ir, type="equal")
   [1] FALSE  TRUE  TRUE
   > suppressWarnings(ir %in% ir)
   [1] TRUE TRUE TRUE

Also I believe the new %in% should generally be faster than
overlapsAny( , type="equal"), and also perhaps more memory
efficient, but I didn't do enough testing to quantify this.

H.

>
> Doh, I can’t stay away from this issue for some reason..... Anyway, my 2
> cents
>
> ~Malcolm
>
> *From:*Tim Triche, Jr. [mailto:tim.triche at gmail.com]
> *Sent:* Tuesday, January 08, 2013 4:12 PM
> *To:* Michael Lawrence
> *Cc:* Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
> bioconductor at r-project.org
> *Subject:* Re: [BioC] countMatches() (was: table for GenomicRanges)
>
> Michael: your suggestion is both clearer and more concise than mine was.
>   +1
>
> (I prefer x %i% y %i% z rather than intersect(x, intersect(y, z)) for
> the same reason)
>
> On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:
>
> I would vote for %over% instead of %ov%. Just 2 more characters but way
> clearer, at least to me. The hardest thing to type are the %'s.
>
> Michael
>
> On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Thanks Tim, Malcolm for the feedback.
>
>     @Tim, I won't comment on the variants of %ov% you are proposing for
>     doing "within" or "equal" instead of "any" (but if people want them,
>     I'll add them too). For now I just want to focus on restoring the
>     convenience of the old %in%, whose removal is understandably causing
>     some frustration. And so we can move on.
>
>     Cheers,
>     H.
>
>
>
>
>     On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:
>
>         hell, I'll add the operators if there's support for them.  obviously
>         they're not a big deal and a patch would take 5 minutes flat.
>
>         my hope was to be very explicit about what each type of
>         operation meant,
>         so that when a newcomer to the Ranges API sees
>
>             peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
>
>         it cannot be confused with
>
>             peaks %within% rangesThatCorrespondToSomeChromatinState
>
>         or
>
>             peaks %equal% aBunchOfDNAseFootprints
>
>         or
>
>             DMRs %in% genes  ## what the hell does this really mean,
>         anyways?
>            it's so bad on so many levels
>
>         because whenever someone says "what is the advantage of Ranges-based
>         analyses?", these are the archetypal sorts of queries that come
>         to mind.
>            Except that usually in my examples they are based on posterior
>         probabilities, but perhaps that could stand to change.
>
>         Anyways, that's just my bias, and you're doing the heavy
>         lifting.  But
>         if people agree with the motivations I will write the patch today.
>
>         Cheers,
>
>         --t
>
>
>
>
>         On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>              Hi Tim,
>
>              I could add the %ov% operator as a replacement for the old
>         %in%. So you
>              would write 'peaks %ov% genes' instead of 'peaks %in%
>         genes'. Would just
>              be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>
>              Cheers,
>              H.
>
>
>              On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
>                  So why not leave %in% as it was and transition
>         everything forward to
>                  explicitly using {  `%within%`,
>         `%overlaps%`|`%overlapping%`,
>                  `%equals%`
>                  } such that
>
>                      identical( x %within% table, countOverlaps(x, table,
>                  type='within') >
>                  0 ) == TRUE
>                      identical( x %overlaps% table, countOverlaps(x, table,
>                  type='any') >
>                  0 ) == TRUE
>                      identical( x %equals% table, countOverlaps(x, table,
>                  type='equal') >
>                  0 ) == TRUE
>
>                  and for the time being,
>
>                      identical( x %overlaps% table, countOverlaps(x, table,
>                  type='any') >
>                  0 ) == TRUE ## but with a noisy nastygram that will halt if
>                  options("warn"=2)
>                  No breakage for %in% methods until such time as a full
>                  deprecation cycle
>                  has passed, and if the maintainers can't be arsed to do
>         anything
>                  at all
>                  about the warnings by the second full release, then
>         perhaps they
>                  don't
>                  really care that much after all.  Just a thought?
>
>                    From someone (me) who has their own issues with keeping
>                  everything up
>                  to date and should know better.  If you want to use
>         %in% for
>
>                      peaks %in% genes (why on earth would you do this
>         rather than
>                  peaks
>                  %in% promoters(genes), anyways?)
>
>                  then a nastygram could be emitted "WARNING: YOUR SHORTHAND
>                  NOTATION IS
>                  DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and
>         everyone is
>                  (more
>                  or less) happy.
>
>
>
>                  On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
>                  <lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>
>         <mailto:lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>>
>
>                  <mailto:lawrence.michael at gene.
>         <mailto:lawrence.michael at gene.>__com
>                  <mailto:lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>>>> wrote:
>
>
>
>                       On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
>                  <hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>
>                       <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>> wrote:
>
>                           Hi Michael,
>
>                           I don't think "match" (the word) always has to
>         mean
>                  "equality"
>                           either.
>                           However having match() (the function) do
>         "whole exact
>                  matching" (aka
>                           "equality") for any kind of vector-like object
>         has the
>                  advantage of:
>
>                              (a) making it consistent with base::match()
>                  (?base::match is
>                           pretty
>                                  explicit about what the contract of
>         match() is)
>
>
>                       (a) alone is obviously not enough.  We have many
>         methods,
>                  like the
>                       set operations, that treat ranges specially.  Are
>         we going
>                  to start
>                       moving everything toward the base behavior? And have
>                  rangeIntersect,
>                       rangeSetdiff, etc?
>
>                              (b) preserving its relationship with ==,
>                  duplicated(), unique(),
>                                  etc...
>
>
>                       So it becomes consistent with duplicated/unique,
>         but we lose
>                       consistency with the set operations.
>
>                              (c) not frustrating the user who needs
>         something to
>                  do exact
>                                  matching on ranges (as I mentioned
>         previously,
>                  if you take
>                                  match() away from him/her, s/he'll be
>         left with
>                  nothing).
>
>
>                       No one has ever asked for match() to behave this
>         way. There
>                  was a
>                       request for a way to tabulate identical ranges. It
>         was a
>                  nice idea
>                       to extract the general "outer equal" findMatches
>         function.
>                  But the
>                       changes seem to be snow-balling.  These types of
>         changes
>                  mean a lot
>                       of maintenance work for the users. A deprecation
>         cycle does not
>                       circumvent that.
>
>
>                           IMO those advantages counterbalance *by far*
>         the very
>                  little
>                           convenience you get from having 'match(query,
>         subject)' do
>                           'findOverlaps(query, subject, select="first")' on
>                           IRanges/GRanges objects. If you need to do
>         that, just
>                  use the
>                           latter, or, if you think that's still too much
>         typing,
>                  define
>                           a wrapper e.g. 'ovmatch(query, subject)'.
>
>                           There are plenty of specialized tools around
>         for doing
>                           inexact/fuzzy/partial/overlap matching for many
>                  particular types
>                           of vector-like objects: grep() and family,
>         pmatch(),
>                  charmatch(),
>                           agrep(), grepRaw(), matchPattern() and family,
>                  findOverlaps() and
>                           family, findIntervals(), etc... For the
>         reasons I mentioned
>                           above, none of them should hijack match() to
>         make it do
>                  some
>                           particular type of inexact matching on some
>         particular
>                  type of
>                           objects. Even if, for that particular type of
>         objects,
>                  doing that
>                           particular type of inexact matching is more
>         common than
>                  doing
>                           exact matching.
>
>                           H.
>
>
>
>                           On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>
>                               I think having overlapsAny is a nice
>         addition and
>                  helps make
>                               the API
>                               more complete and explicit. Are you sure
>         we need to
>                  change
>                               the behavior
>                               of the match method for this relatively
>         uncommon
>                  use case?
>
>
>                           Yes because otherwise users with a use case of
>         doing
>                  match()
>
>                           even if it's uncommon,
>
>
>                               I don't think
>                               "match" always has to mean "equality". It
>         is a more
>                  general
>                               concept in
>                               my mind. The most common use case for matching
>                  ranges is
>                               overlap.
>
>
>                           Of course "match" doesn't always have to mean
>         equality.
>                  But of base
>
>
>                               Michael
>
>
>                               On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
>                               <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>> wrote:
>
>                                    Yes 'peaks %in% genes' is cute and was
>                  probably doing
>                               the right thing
>                                    for most users (although not all).
>         But 'exons %in%
>                               genes' is cute too
>                                     and was probably doing the wrong
>         thing for
>                  all users.
>                               Advanced users
>                                    like you guys would have no problem
>         switching to
>
>                                       !is.na <http://is.na>
>         <http://is.na> <http://is.na>
>
>                               <http://is.na>(findOverlaps(____peaks, genes,
>
>
>                  type="within",
>
>                                    select="any"))
>
>                                    or
>
>                                       !is.na <http://is.na>
>         <http://is.na> <http://is.na>
>
>                               <http://is.na>(findOverlaps(____peaks, genes,
>
>
>                  type="equal",
>
>
>                                    select="any"))
>
>                                    in case 'peaks %in% genes' was not doing
>                  exactly what
>                               you wanted,
>                                    but most users would not find this
>         particularly
>                               friendly. Even
>                                    worse, some users probably didn't
>         realize that
>                  'peaks
>                               %in% genes'
>                                    was not doing exactly what they
>         thought it did
>                  because
>                               "peaks in
>                                    genes" in English suggests that the
>         peaks are
>                  within
>                               the genes,
>                                    but it's not what 'peaks %in% genes'
>         does.
>
>                                    Having overlapsAny(), with exactly
>         the same extra
>                               arguments as
>                                    countOverlaps() and
>         subsetByOverlaps() (i.e.
>                  'maxgap',
>                               'minoverlap',
>                                    'type', 'ignore.strand'), all of them
>                  documented (and
>                               with most
>                                    users more or less familiar with them
>         already)
>                  has the
>                               virtue to
>                                    expose the user to all the options
>         from the
>                  very start,
>                               and to
>                                    help him/her make the right choice.
>         Of course
>                  there
>                               will be users
>                                    that don't want or don't have the time to
>                  read/think
>                               about all the
>                                    options. Not a big deal: they'll just do
>                               'overlapsAny(query, subject)',
>                                    which is not a lot more typing than
>         'query %in%
>                               subject', especially
>                                    if they use tab completion.
>
>                                    It's true that it's more common to ask
>                  questions about
>                               overlap than
>                                    about equality but there are some use
>         cases
>                  for the
>                               latter (as the
>                                    original thread shows). Until now,
>         when you
>                  had such a
>                               use case, you
>                                    could not use match() or %in%, which
>         would
>                  have been
>                               the natural things
>                                    to use, because they got hijacked to do
>                  something else,
>                               and you were
>                                    left with nothing. Not a satisfying
>         situation.
>                  So at a
>                               minimum, we
>                                    needed to restore the true/real/original
>                  semantic of
>                               match() to do
>                                    "equality" instead of "overlap". But
>         it's hard
>                  to do
>                               this for match()
>                                    and not do it for %in% too. For more
>         than 99% of R
>                               users, %in% is
>                                    just a simple wrapper for 'match(x,
>         table,
>                  nomatch = 0)
>                                > 0' (this
>                                    is how it has been documented and
>         implemented
>                  in base R
>                               for many
>                                    years). Not maintaining this relationship
>                  between %in%
>                               and match()
>                                    would only cause grief and frustration to
>                  newcomers to
>                               Bioconductor.
>
>                                    H.
>
>
>
>                                    On 01/04/2013 03:32 PM, Cook, Malcolm
>         wrote:
>
>                                        Hiya again,
>
>                                        I am definitely a late comer to
>         BioC, so I
>                               definitely easily
>                                        defer to
>                                        the tide of history.
>
>                                        But I do think you miss my point
>         Michael
>                  about the
>                               proposed change
>                                        making the relationship between
>         %in% and
>                  match for
>                               {G,I}Ranges{List}
>                                        mimic that between other vectors,
>         and I do
>                  think
>                               that changing
>                                        the API
>                                        would make other late-comers take
>         to BioC
>                               easier/faster.
>
>                                        That said, I NEVER use %in% so I
>         really
>                  have no
>                               stake in the
>                                        matter, and
>                                        I DEFINITELY appreciate the
>         argument to not
>                               changing the API
>                                        just for
>                                        sematic sweetness.
>
>                                        That that said, Herve is _/so
>         good/_ about
>                               deprecations and warnings
>
>                                        that make such changes fairly easily
>                  digestible.
>
>                                        That that that.... enough.... I
>         bow out of
>                  this
>                               one....!!!!
>
>                                        Always learning and Happy New
>         Year to all
>                  lurkers,
>
>                                        ~Malcolm
>
>                                        *From:*Michael Lawrence
>
>                               [mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>
>                  <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>>.
>                               <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>
>                  <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>>.__>____com
>
>                                        <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>.
>                  <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>.>____com
>                               <mailto:lawrence.michael at gene.
>         <mailto:lawrence.michael at gene.>__com
>                  <mailto:lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>>>>]
>                                        *Sent:* Friday, January 04, 2013
>         5:11 PM
>                                        *To:* Cook, Malcolm
>                                        *Cc:* Sean Davis; Michael
>         Lawrence; Hervé
>                  Pagès
>                                        (hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>
>
>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>); Tim
>
>
>
>                                        Triche, Jr.; Vedran Franke;
>         bioconductor at r-project.org <mailto:bioconductor at r-project.org>
>         <mailto:bioconductor at r-project.org
>         <mailto:bioconductor at r-project.org>>
>                  <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>
>                  <mailto:bioconductor at r-project.org
>         <mailto:bioconductor at r-project.org>>>
>
>
>         <mailto:bioconductor at r-____project.org
>         <mailto:bioconductor at r-____project.org>
>                  <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>>
>
>                               <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>
>                  <mailto:bioconductor at r-project.org
>         <mailto:bioconductor at r-project.org>>>>
>
>                                        *Subject:* Re: [BioC]
>         countMatches() (was:
>                  table
>                               for GenomicRanges)
>
>
>                                        On Fri, Jan 4, 2013 at 1:56 PM,
>         Cook, Malcolm
>                               <MEC at stowers.org <mailto:MEC at stowers.org>
>         <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
>                  <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
>         <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>
>                                        <mailto:MEC at stowers.org
>         <mailto:MEC at stowers.org>
>                  <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
>         <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
>                  <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>>
>                                        <mailto:MEC at stowers.org
>         <mailto:MEC at stowers.org>
>                  <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
>         <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
>                  <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>
>                               <mailto:MEC at stowers.org
>         <mailto:MEC at stowers.org> <mailto:MEC at stowers.org
>         <mailto:MEC at stowers.org>>
>                  <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
>         <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>>>> wrote:
>
>                                        Hiya,
>
>                                        For what it is worth...
>
>                                        I think the change to %in% is
>         warranted.
>
>                                        If I understand correctly, this
>         change
>                  restores the
>                               relationship
>                                        between
>                                        the semantics of `%in` and the
>         semantics
>                  of `match`.
>
>                                          From the docs:
>
>                                            '"%in%" <- function(x, table)
>         match(x,
>                  table,
>                               nomatch = 0) > 0'
>
>                                        Herve's change restores this
>         relationship.
>
>
>                                        match and %in% were initially
>         consistent (both
>                               considering any
>                                        overlap);
>                                        Herve has changed both of them
>         together.
>                  The whole
>                               idea behind
>                                        IRanges
>                                        is that ranges are special data
>         types with
>                  special
>                               semantics. We
>                                        have
>                                        reimplemented much of the existing R
>                  vector API
>                               using those
>                                        semantics;
>                                        this extends beyond match/%in%. I am
>                  hesitant about
>                               making such
>                                        sweeping
>                                        changes to the API so late in the
>                  life-cycle of the
>                               package.
>                                        There was a
>                                        feature request for a way to count
>                  identical ranges
>                               in a set of
>                                        ranges.
>                                        Let's please not get carried away
>         and start
>                               redesigning the API
>                                        for this
>                                        one, albeit useful, request.
>         There are all
>                  sorts of
>                                        inconsistencies in
>                                        the API, and many of them were
>         conscious
>                  decisions
>                               that considered
>                                        practical use cases.
>
>                                        Michael
>
>
>                                             Herve, I suspect you were
>         you as a
>                  result able to
>                                        completely drop
>                                             all the
>         `%in%,BiocClass1,BiocClass2`
>                               definitions and depend
>                                        upon
>                                             base::%in%
>
>                                             Am I right?
>
>                                             If so, may I suggest that
>         Herve stay the
>                               course, with the
>                                        addition of
>                                                '"%ol%" <- function(a, b)
>                  findOverlaps(a,
>                               b, maxgap=0L,
>                                             minoverlap=1L, type='any',
>                  select='all') > 0'
>
>                                             This would provide a
>         perspicacious
>                  idiom, thereby
>                                        optimizing the API
>                                             for Michaels observed common
>         use case.
>
>                                             Just sayin'
>
>                                             ~Malcolm
>
>
>                                               .-----Original Message-----
>                                               .From:
>
>         bioconductor-bounces at r-______project.org
>         <mailto:bioconductor-bounces at r-______project.org>
>                  <mailto:bioconductor-bounces at r-____project.org
>         <mailto:bioconductor-bounces at r-____project.org>>
>                               <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>__r-__project.org
>         <http://r-__project.org>
>                  <mailto:bioconductor-bounces at r-__project.org
>         <mailto:bioconductor-bounces at r-__project.org>>>
>
>                                        <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>____r-project.org
>         <http://r-project.org>
>                  <http://r-project.org>
>                               <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>__r-project.org <http://r-project.org>
>                  <mailto:bioconductor-bounces at r-project.org
>         <mailto:bioconductor-bounces at r-project.org>>>>
>
>
>           <mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>
>
>                               <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>>______r-project.org
>         <http://r-project.org>
>                  <http://r-project.org>
>                               <http://r-project.org>
>
>                                        <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>____r-project.org
>         <http://r-project.org>
>                  <http://r-project.org>
>                               <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>__r-project.org <http://r-project.org>
>                  <mailto:bioconductor-bounces at r-project.org
>         <mailto:bioconductor-bounces at r-project.org>>>>>
>
>           [mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>
>
>                               <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>>______r-project.org
>         <http://r-project.org>
>                  <http://r-project.org>
>                               <http://r-project.org>
>
>                                        <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>____r-project.org
>         <http://r-project.org>
>                  <http://r-project.org>
>                               <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>__r-project.org <http://r-project.org>
>                  <mailto:bioconductor-bounces at r-project.org
>         <mailto:bioconductor-bounces at r-project.org>>>>
>
>
>           <mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>
>
>                               <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>>______r-project.org
>         <http://r-project.org>
>                  <http://r-project.org>
>                               <http://r-project.org>
>
>                                        <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>
>                  <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>>____r-project.org
>         <http://r-project.org>
>                  <http://r-project.org>
>                               <mailto:bioconductor-bounces@
>         <mailto:bioconductor-bounces@>__r-project.org <http://r-project.org>
>                  <mailto:bioconductor-bounces at r-project.org
>         <mailto:bioconductor-bounces at r-project.org>>>>>] On Behalf Of Sean
>                                        Davis
>                                               .Sent: Friday, January 04,
>         2013 3:37 PM
>                                               .To: Michael Lawrence
>                                               .Cc: Tim Triche, Jr.;
>         Vedran Franke;
>         bioconductor at r-project.org <mailto:bioconductor at r-project.org>
>         <mailto:bioconductor at r-project.org
>         <mailto:bioconductor at r-project.org>>
>                               <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>
>                  <mailto:bioconductor at r-project.org
>         <mailto:bioconductor at r-project.org>>>
>                               <mailto:bioconductor at r-____project.org
>         <mailto:bioconductor at r-____project.org>
>                  <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>>
>                               <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>
>                  <mailto:bioconductor at r-project.org
>         <mailto:bioconductor at r-project.org>>>>
>
>                  <mailto:bioconductor at r-______project.org
>         <mailto:bioconductor at r-______project.org>
>                  <mailto:bioconductor at r-____project.org
>         <mailto:bioconductor at r-____project.org>>
>
>
>                               <mailto:bioconductor at r-____project.org
>         <mailto:bioconductor at r-____project.org>
>                  <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>>>
>
>
>
>         <mailto:bioconductor at r-____project.org
>         <mailto:bioconductor at r-____project.org>
>                  <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>>
>                               <mailto:bioconductor at r-__project.org
>         <mailto:bioconductor at r-__project.org>
>                  <mailto:bioconductor at r-project.org
>         <mailto:bioconductor at r-project.org>>>>>
>
>                                               .Subject: Re: [BioC]
>         countMatches()
>                  (was:
>                               table for
>                                        GenomicRanges)
>                                               .
>                                               .On Fri, Jan 4, 2013 at
>         4:32 PM,
>                  Michael
>                               Lawrence
>
>           .<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
>                  <mailto:lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>>
>                               <mailto:lawrence.michael at gene.
>         <mailto:lawrence.michael at gene.>__com
>                  <mailto:lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>>>
>                                        <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>.
>                  <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>.>____com
>                               <mailto:lawrence.michael at gene.
>         <mailto:lawrence.michael at gene.>__com
>                  <mailto:lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>>>>
>
>                                        <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>
>                  <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>>.
>                               <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>
>                  <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>>.__>____com
>
>                                        <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>.
>                  <mailto:lawrence.michael at gene
>         <mailto:lawrence.michael at gene>.>____com
>                               <mailto:lawrence.michael at gene.
>         <mailto:lawrence.michael at gene.>__com
>                  <mailto:lawrence.michael at gene.com
>         <mailto:lawrence.michael at gene.com>>>>>> wrote:
>                                               .> The change to the
>         behavior of
>                  %in% is a
>                               pretty big
>                                        one. Are you
>                                             thinking
>                                               .> that all set-based
>         operations should
>                               behave this way? For
>                                             example, setdiff
>                                               .> and intersect? I really
>         liked
>                  the syntax
>                               of "peaks
>                                        %in% genes".
>                                             In my
>                                               .> experience, it's way
>         more common
>                  to ask
>                               questions
>                                        about overlap
>                                             than about
>                                               .> equality, so I'd rather
>         optimize
>                  the API
>                               for that use
>                                        case. But
>                                             again,
>                                               .> that's just my personal
>         bias.
>                                               .
>                                               .For what it is worth, I share
>                  Michael's
>                               personal bias here.
>                                               .
>                                               .Sean
>                                               .
>                                               .
>                                               .> Michael
>                                               .>
>                                               .>
>                                               .> On Fri, Jan 4, 2013 at
>         1:11 PM,
>                  Hervé Pagès
>                                        <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>
>                                             <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>>>
>         wrote:
>                                               .>
>                                               .>> Hi,
>                                               .>>
>                                               .>> I added findMatches() and
>                  countMatches()
>                               to the
>                                        latest IRanges /
>                                               .>> GenomicRanges packages
>         (in BioC
>                  devel only).
>                                               .>>
>                                               .>>   findMatches(x,
>         table): An
>                  enhanced
>                               version of
>                                        ‘match’ that
>                                               .>>           returns all the
>                  matches in a
>                               Hits object.
>                                               .>>
>                                               .>>   countMatches(x, table):
>                  Returns an
>                               integer vector
>                                        of the length
>                                               .>>           of ‘x’,
>         containing
>                  the number
>                               of matches in
>                                        ‘table’ for
>                                               .>>           each element
>         in ‘x’.
>                                               .>>
>
>                                               .>> countMatches() is what
>         you can
>                  use to
>                                        tally/count/tabulate
>                                             (choose your
>
>                                               .>> preferred term) the unique
>                  elements in a
>                               GRanges object:
>                                               .>>
>                                               .>>   library(GenomicRanges)
>                                               .>>   set.seed(33)
>                                               .>>   gr <- GRanges("chr1",
>
>
>         IRanges(sample(15,20,replace=*______*TRUE),
>
>
>
>
>                                             width=5))
>                                               .>>
>                                               .>> Then:
>                                               .>>
>                                               .>>   > gr_levels <-
>         sort(unique(gr))
>                                               .>>   >
>         countMatches(gr_levels, gr)
>                                               .>>    [1] 1 1 1 2 4 2 2 1
>         2 2 2
>                                               .>>
>                                               .>> Note that
>         findMatches() and
>                               countMatches() also work on
>                                             IRanges and
>                                               .>> DNAStringSet objects,
>         as well as on
>                               ordinary atomic
>                                        vectors:
>                                               .>>
>                                               .>>   library(hgu95av2probe)
>                                               .>>   library(Biostrings)
>                                               .>>   probes <-
>                  DNAStringSet(hgu95av2probe)
>                                               .>>   unique_probes <-
>         unique(probes)
>                                               .>>   count <-
>                  countMatches(unique_probes,
>                               probes)
>                                               .>>   max(count)  # 7
>                                               .>>
>                                               .>> I made other changes in
>                               IRanges/GenomicRanges so that
>                                        the notion
>                                               .>> of "match" between
>         elements of a
>                               vector-like object now
>                                             consistently
>                                               .>> means "equality"
>         instead of
>                  "overlap",
>                               even for
>                                        range-based
>                                             objects
>                                               .>> like IRanges or GRanges
>                  objects. This
>                               notion of
>                                        "equality" is the
>                                               .>> same that is used by
>         ==. The most
>                               visible consequence
>                                        of those
>                                               .>> changes is that using %in%
>                  between 2
>                               IRanges or
>                                        GRanges objects
>                                               .>> 'query' and 'subject'
>         in order
>                  to do
>                               overlaps was
>                                        replaced by
>                                               .>> overlapsAny(query,
>         subject).
>                                               .>>
>                                               .>>   overlapsAny(query,
>         subject):
>                  Finds the
>                               ranges in
>                                        ‘query’ that
>                                               .>>      overlap any of
>         the ranges
>                  in ‘subject’.
>                                               .>>
>
>                                               .>> There are warnings and
>         deprecation
>                               messages in place
>                                        to help
>                                             smooth
>
>                                               .>> the transition.
>                                               .>>
>                                               .>> Cheers,
>                                               .>> H.
>                                               .>>
>                                               .>> --
>                                               .>> Hervé Pagès
>                                               .>>
>                                               .>> Program in
>         Computational Biology
>                                               .>> Division of Public
>         Health Sciences
>                                               .>> Fred Hutchinson Cancer
>         Research
>                  Center
>                                               .>> 1100 Fairview Ave. N,
>         M1-B514
>                                               .>> P.O. Box 19024
>                                               .>> Seattle, WA 98109-1024
>                                               .>>
>                                               .>> E-mail:
>         hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>>>
>
>                                        <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>
>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>>
>
>                                               .>> Phone: (206) 667-5791
>         <tel:%28206%29%20667-5791>
>                  <tel:%28206%29%20667-5791>
>                               <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>                                        <tel:%28206%29%20667-5791>
>                                               .>> Fax: (206) 667-1319
>         <tel:%28206%29%20667-1319>
>                  <tel:%28206%29%20667-1319>
>                               <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>                                        <tel:%28206%29%20667-1319>
>
>                                               .>>
>                                               .>
>                                               .>         [[alternative HTML
>                  version deleted]]
>                                               .>
>                                               .>
>                                               .>
>
>
>           _____________________________________________________
>
>
>
>
>                                               .> Bioconductor mailing list
>                                               .>
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>                  <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>
>                               <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>
>                  <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>>
>
>         <mailto:Bioconductor at r-____project.org
>         <mailto:Bioconductor at r-____project.org>
>                  <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>>
>                               <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>
>                  <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>>>
>
>
>         <mailto:Bioconductor at r-______project.org
>         <mailto:Bioconductor at r-______project.org>
>                  <mailto:Bioconductor at r-____project.org
>         <mailto:Bioconductor at r-____project.org>>
>
>
>                               <mailto:Bioconductor at r-____project.org
>         <mailto:Bioconductor at r-____project.org>
>                  <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>>>
>
>
>         <mailto:Bioconductor at r-____project.org
>         <mailto:Bioconductor at r-____project.org>
>                  <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>>
>                               <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>
>                  <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>>>>
>
>                                               .>
>
>         https://stat.ethz.ch/mailman/______listinfo/bioconductor
>                  <https://stat.ethz.ch/mailman/____listinfo/bioconductor>
>
>
>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioconductor
>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor>>
>
>
>
>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioconductor
>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                  <https://stat.ethz.ch/mailman/listinfo/bioconductor>>>
>                                               .> Search the archives:
>
>         http://news.gmane.org/gmane.______science.biology.informatics.______conductor
>
>         <http://news.gmane.org/gmane.____science.biology.informatics.____conductor>
>
>
>
>
>         <http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>>
>
>
>
>
>         <http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
>
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>>>
>                                               .
>
>                  ._____________________________________________________
>
>
>
>
>                                               .Bioconductor mailing list
>
>           .Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>                  <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>
>                               <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>
>                  <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>>
>
>         <mailto:Bioconductor at r-____project.org
>         <mailto:Bioconductor at r-____project.org>
>                  <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>>
>                               <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>
>                  <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>>>
>
>
>         <mailto:Bioconductor at r-______project.org
>         <mailto:Bioconductor at r-______project.org>
>                  <mailto:Bioconductor at r-____project.org
>         <mailto:Bioconductor at r-____project.org>>
>                               <mailto:Bioconductor at r-____project.org
>         <mailto:Bioconductor at r-____project.org>
>                  <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>>>
>
>
>
>         <mailto:Bioconductor at r-____project.org
>         <mailto:Bioconductor at r-____project.org>
>                  <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>>
>                               <mailto:Bioconductor at r-__project.org
>         <mailto:Bioconductor at r-__project.org>
>                  <mailto:Bioconductor at r-project.org
>         <mailto:Bioconductor at r-project.org>>>>>
>
>
>                  .https://stat.ethz.ch/mailman/______listinfo/bioconductor
>                  <https://stat.ethz.ch/mailman/____listinfo/bioconductor>
>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioconductor
>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor>>
>
>
>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioconductor
>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                  <https://stat.ethz.ch/mailman/listinfo/bioconductor>>>
>                                               .Search the archives:
>
>         http://news.gmane.org/gmane.______science.biology.informatics.______conductor
>
>         <http://news.gmane.org/gmane.____science.biology.informatics.____conductor>
>
>
>         <http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>>
>
>
>
>
>         <http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
>
>         <http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>>>
>
>
>                                    --
>                                    Hervé Pagès
>
>                                    Program in Computational Biology
>                                    Division of Public Health Sciences
>                                    Fred Hutchinson Cancer Research Center
>                                    1100 Fairview Ave. N, M1-B514
>                                    P.O. Box 19024
>                                    Seattle, WA 98109-1024
>
>                                    E-mail: hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
>                               <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>
>
>                                    Phone: (206) 667-5791
>         <tel:%28206%29%20667-5791>
>                  <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
>                               <tel:%28206%29%20667-5791>
>
>                                    Fax: (206) 667-1319
>         <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
>                  <tel:%28206%29%20667-1319>
>                               <tel:%28206%29%20667-1319>
>
>
>                           --
>                           Hervé Pagès
>
>                           Program in Computational Biology
>                           Division of Public Health Sciences
>                           Fred Hutchinson Cancer Research Center
>                           1100 Fairview Ave. N, M1-B514
>                           P.O. Box 19024
>                           Seattle, WA 98109-1024
>
>                           E-mail: hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>>
>                  <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>                           Phone: (206) 667-5791
>         <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
>                  <tel:%28206%29%20667-5791>
>                           Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>                  <tel:%28206%29%20667-1319>
>
>
>
>
>
>                  --
>
>                  /A model is a lie that helps you see the truth./
>                  /
>                  /
>                  Howard Skipper
>
>
>         <http://cancerres.__aacrjournals.org/content/31/9/__1173.full.pdf <http://aacrjournals.org/content/31/9/__1173.full.pdf>
>         <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>>
>
>
>
>
>              --
>              Hervé Pagès
>
>              Program in Computational Biology
>              Division of Public Health Sciences
>              Fred Hutchinson Cancer Research Center
>              1100 Fairview Ave. N, M1-B514
>
>     ...
>
>     [Message clipped]
>
>
>
> --
> /A model is a lie that helps you see the truth./
>
> Howard Skipper
> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list