[BioC] countMatches() (was: table for GenomicRanges)
Hervé Pagès
hpages at fhcrc.org
Wed Jan 9 01:14:37 CET 2013
On 01/08/2013 02:59 PM, Cook, Malcolm wrote:
> If we’re voting/brainstorming, I’d go for one operator for value that
> the ‘type’ arg of overlap can take on
>
> Thus:
>
> %olStart%
>
> %olEnd%
>
> %olWithin%
>
> %olAny% (perhaps with alias of just ‘%ol%’)
>
> %olEqual% (which should be same as %in%, right)
Except for zero-width ranges: they never overlap with anything, but
2 zero-width ranges with the same start are considered equal:
> ir <- IRanges(start=5:7, width=0:2)
> ir
IRanges of length 3
start end width
[1] 5 4 0
[2] 6 6 1
[3] 7 8 2
> overlapsAny(ir, ir, type="equal")
[1] FALSE TRUE TRUE
> suppressWarnings(ir %in% ir)
[1] TRUE TRUE TRUE
Also I believe the new %in% should generally be faster than
overlapsAny( , type="equal"), and also perhaps more memory
efficient, but I didn't do enough testing to quantify this.
H.
>
> Doh, I can’t stay away from this issue for some reason..... Anyway, my 2
> cents
>
> ~Malcolm
>
> *From:*Tim Triche, Jr. [mailto:tim.triche at gmail.com]
> *Sent:* Tuesday, January 08, 2013 4:12 PM
> *To:* Michael Lawrence
> *Cc:* Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
> bioconductor at r-project.org
> *Subject:* Re: [BioC] countMatches() (was: table for GenomicRanges)
>
> Michael: your suggestion is both clearer and more concise than mine was.
> +1
>
> (I prefer x %i% y %i% z rather than intersect(x, intersect(y, z)) for
> the same reason)
>
> On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>> wrote:
>
> I would vote for %over% instead of %ov%. Just 2 more characters but way
> clearer, at least to me. The hardest thing to type are the %'s.
>
> Michael
>
> On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Thanks Tim, Malcolm for the feedback.
>
> @Tim, I won't comment on the variants of %ov% you are proposing for
> doing "within" or "equal" instead of "any" (but if people want them,
> I'll add them too). For now I just want to focus on restoring the
> convenience of the old %in%, whose removal is understandably causing
> some frustration. And so we can move on.
>
> Cheers,
> H.
>
>
>
>
> On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:
>
> hell, I'll add the operators if there's support for them. obviously
> they're not a big deal and a patch would take 5 minutes flat.
>
> my hope was to be very explicit about what each type of
> operation meant,
> so that when a newcomer to the Ranges API sees
>
> peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
>
> it cannot be confused with
>
> peaks %within% rangesThatCorrespondToSomeChromatinState
>
> or
>
> peaks %equal% aBunchOfDNAseFootprints
>
> or
>
> DMRs %in% genes ## what the hell does this really mean,
> anyways?
> it's so bad on so many levels
>
> because whenever someone says "what is the advantage of Ranges-based
> analyses?", these are the archetypal sorts of queries that come
> to mind.
> Except that usually in my examples they are based on posterior
> probabilities, but perhaps that could stand to change.
>
> Anyways, that's just my bias, and you're doing the heavy
> lifting. But
> if people agree with the motivations I will write the patch today.
>
> Cheers,
>
> --t
>
>
>
>
> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
> Hi Tim,
>
> I could add the %ov% operator as a replacement for the old
> %in%. So you
> would write 'peaks %ov% genes' instead of 'peaks %in%
> genes'. Would just
> be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>
> Cheers,
> H.
>
>
> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
> So why not leave %in% as it was and transition
> everything forward to
> explicitly using { `%within%`,
> `%overlaps%`|`%overlapping%`,
> `%equals%`
> } such that
>
> identical( x %within% table, countOverlaps(x, table,
> type='within') >
> 0 ) == TRUE
> identical( x %overlaps% table, countOverlaps(x, table,
> type='any') >
> 0 ) == TRUE
> identical( x %equals% table, countOverlaps(x, table,
> type='equal') >
> 0 ) == TRUE
>
> and for the time being,
>
> identical( x %overlaps% table, countOverlaps(x, table,
> type='any') >
> 0 ) == TRUE ## but with a noisy nastygram that will halt if
> options("warn"=2)
> No breakage for %in% methods until such time as a full
> deprecation cycle
> has passed, and if the maintainers can't be arsed to do
> anything
> at all
> about the warnings by the second full release, then
> perhaps they
> don't
> really care that much after all. Just a thought?
>
> From someone (me) who has their own issues with keeping
> everything up
> to date and should know better. If you want to use
> %in% for
>
> peaks %in% genes (why on earth would you do this
> rather than
> peaks
> %in% promoters(genes), anyways?)
>
> then a nastygram could be emitted "WARNING: YOUR SHORTHAND
> NOTATION IS
> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED" and
> everyone is
> (more
> or less) happy.
>
>
>
> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
> <lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>
>
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>> wrote:
>
>
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
> <hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>> wrote:
>
> Hi Michael,
>
> I don't think "match" (the word) always has to
> mean
> "equality"
> either.
> However having match() (the function) do
> "whole exact
> matching" (aka
> "equality") for any kind of vector-like object
> has the
> advantage of:
>
> (a) making it consistent with base::match()
> (?base::match is
> pretty
> explicit about what the contract of
> match() is)
>
>
> (a) alone is obviously not enough. We have many
> methods,
> like the
> set operations, that treat ranges specially. Are
> we going
> to start
> moving everything toward the base behavior? And have
> rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship with ==,
> duplicated(), unique(),
> etc...
>
>
> So it becomes consistent with duplicated/unique,
> but we lose
> consistency with the set operations.
>
> (c) not frustrating the user who needs
> something to
> do exact
> matching on ranges (as I mentioned
> previously,
> if you take
> match() away from him/her, s/he'll be
> left with
> nothing).
>
>
> No one has ever asked for match() to behave this
> way. There
> was a
> request for a way to tabulate identical ranges. It
> was a
> nice idea
> to extract the general "outer equal" findMatches
> function.
> But the
> changes seem to be snow-balling. These types of
> changes
> mean a lot
> of maintenance work for the users. A deprecation
> cycle does not
> circumvent that.
>
>
> IMO those advantages counterbalance *by far*
> the very
> little
> convenience you get from having 'match(query,
> subject)' do
> 'findOverlaps(query, subject, select="first")' on
> IRanges/GRanges objects. If you need to do
> that, just
> use the
> latter, or, if you think that's still too much
> typing,
> define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools around
> for doing
> inexact/fuzzy/partial/overlap matching for many
> particular types
> of vector-like objects: grep() and family,
> pmatch(),
> charmatch(),
> agrep(), grepRaw(), matchPattern() and family,
> findOverlaps() and
> family, findIntervals(), etc... For the
> reasons I mentioned
> above, none of them should hijack match() to
> make it do
> some
> particular type of inexact matching on some
> particular
> type of
> objects. Even if, for that particular type of
> objects,
> doing that
> particular type of inexact matching is more
> common than
> doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence wrote:
>
> I think having overlapsAny is a nice
> addition and
> helps make
> the API
> more complete and explicit. Are you sure
> we need to
> change
> the behavior
> of the match method for this relatively
> uncommon
> use case?
>
>
> Yes because otherwise users with a use case of
> doing
> match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean "equality". It
> is a more
> general
> concept in
> my mind. The most common use case for matching
> ranges is
> overlap.
>
>
> Of course "match" doesn't always have to mean
> equality.
> But of base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé Pagès
> <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>> wrote:
>
> Yes 'peaks %in% genes' is cute and was
> probably doing
> the right thing
> for most users (although not all).
> But 'exons %in%
> genes' is cute too
> and was probably doing the wrong
> thing for
> all users.
> Advanced users
> like you guys would have no problem
> switching to
>
> !is.na <http://is.na>
> <http://is.na> <http://is.na>
>
> <http://is.na>(findOverlaps(____peaks, genes,
>
>
> type="within",
>
> select="any"))
>
> or
>
> !is.na <http://is.na>
> <http://is.na> <http://is.na>
>
> <http://is.na>(findOverlaps(____peaks, genes,
>
>
> type="equal",
>
>
> select="any"))
>
> in case 'peaks %in% genes' was not doing
> exactly what
> you wanted,
> but most users would not find this
> particularly
> friendly. Even
> worse, some users probably didn't
> realize that
> 'peaks
> %in% genes'
> was not doing exactly what they
> thought it did
> because
> "peaks in
> genes" in English suggests that the
> peaks are
> within
> the genes,
> but it's not what 'peaks %in% genes'
> does.
>
> Having overlapsAny(), with exactly
> the same extra
> arguments as
> countOverlaps() and
> subsetByOverlaps() (i.e.
> 'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all of them
> documented (and
> with most
> users more or less familiar with them
> already)
> has the
> virtue to
> expose the user to all the options
> from the
> very start,
> and to
> help him/her make the right choice.
> Of course
> there
> will be users
> that don't want or don't have the time to
> read/think
> about all the
> options. Not a big deal: they'll just do
> 'overlapsAny(query, subject)',
> which is not a lot more typing than
> 'query %in%
> subject', especially
> if they use tab completion.
>
> It's true that it's more common to ask
> questions about
> overlap than
> about equality but there are some use
> cases
> for the
> latter (as the
> original thread shows). Until now,
> when you
> had such a
> use case, you
> could not use match() or %in%, which
> would
> have been
> the natural things
> to use, because they got hijacked to do
> something else,
> and you were
> left with nothing. Not a satisfying
> situation.
> So at a
> minimum, we
> needed to restore the true/real/original
> semantic of
> match() to do
> "equality" instead of "overlap". But
> it's hard
> to do
> this for match()
> and not do it for %in% too. For more
> than 99% of R
> users, %in% is
> just a simple wrapper for 'match(x,
> table,
> nomatch = 0)
> > 0' (this
> is how it has been documented and
> implemented
> in base R
> for many
> years). Not maintaining this relationship
> between %in%
> and match()
> would only cause grief and frustration to
> newcomers to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook, Malcolm
> wrote:
>
> Hiya again,
>
> I am definitely a late comer to
> BioC, so I
> definitely easily
> defer to
> the tide of history.
>
> But I do think you miss my point
> Michael
> about the
> proposed change
> making the relationship between
> %in% and
> match for
> {G,I}Ranges{List}
> mimic that between other vectors,
> and I do
> think
> that changing
> the API
> would make other late-comers take
> to BioC
> easier/faster.
>
> That said, I NEVER use %in% so I
> really
> have no
> stake in the
> matter, and
> I DEFINITELY appreciate the
> argument to not
> changing the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so
> good/_ about
> deprecations and warnings
>
> that make such changes fairly easily
> digestible.
>
> That that that.... enough.... I
> bow out of
> this
> one....!!!!
>
> Always learning and Happy New
> Year to all
> lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence
>
> [mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>>.__>____com
>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.>____com
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>>]
> *Sent:* Friday, January 04, 2013
> 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael
> Lawrence; Hervé
> Pagès
> (hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
>
>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>); Tim
>
>
>
> Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at r-project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>
>
>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>>
>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>>
>
> *Subject:* Re: [BioC]
> countMatches() (was:
> table
> for GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56 PM,
> Cook, Malcolm
> <MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>
> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>>
> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>
> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org>>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>>>> wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is
> warranted.
>
> If I understand correctly, this
> change
> restores the
> relationship
> between
> the semantics of `%in` and the
> semantics
> of `match`.
>
> From the docs:
>
> '"%in%" <- function(x, table)
> match(x,
> table,
> nomatch = 0) > 0'
>
> Herve's change restores this
> relationship.
>
>
> match and %in% were initially
> consistent (both
> considering any
> overlap);
> Herve has changed both of them
> together.
> The whole
> idea behind
> IRanges
> is that ranges are special data
> types with
> special
> semantics. We
> have
> reimplemented much of the existing R
> vector API
> using those
> semantics;
> this extends beyond match/%in%. I am
> hesitant about
> making such
> sweeping
> changes to the API so late in the
> life-cycle of the
> package.
> There was a
> feature request for a way to count
> identical ranges
> in a set of
> ranges.
> Let's please not get carried away
> and start
> redesigning the API
> for this
> one, albeit useful, request.
> There are all
> sorts of
> inconsistencies in
> the API, and many of them were
> conscious
> decisions
> that considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you were
> you as a
> result able to
> completely drop
> all the
> `%in%,BiocClass1,BiocClass2`
> definitions and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest that
> Herve stay the
> course, with the
> addition of
> '"%ol%" <- function(a, b)
> findOverlaps(a,
> b, maxgap=0L,
> minoverlap=1L, type='any',
> select='all') > 0'
>
> This would provide a
> perspicacious
> idiom, thereby
> optimizing the API
> for Michaels observed common
> use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original Message-----
> .From:
>
> bioconductor-bounces at r-______project.org
> <mailto:bioconductor-bounces at r-______project.org>
> <mailto:bioconductor-bounces at r-____project.org
> <mailto:bioconductor-bounces at r-____project.org>>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-__project.org
> <http://r-__project.org>
> <mailto:bioconductor-bounces at r-__project.org
> <mailto:bioconductor-bounces at r-__project.org>>>
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-project.org <http://r-project.org>
> <mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at r-project.org>>>>
>
>
> <mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>>______r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <http://r-project.org>
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-project.org <http://r-project.org>
> <mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at r-project.org>>>>>
>
> [mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>>______r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <http://r-project.org>
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-project.org <http://r-project.org>
> <mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at r-project.org>>>>
>
>
> <mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>>______r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <http://r-project.org>
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-project.org <http://r-project.org>
> <mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at r-project.org>>>>>] On Behalf Of Sean
> Davis
> .Sent: Friday, January 04,
> 2013 3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.;
> Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at r-project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>>
>
> <mailto:bioconductor at r-______project.org
> <mailto:bioconductor at r-______project.org>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>>
>
>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>>>
>
>
>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>>>
>
> .Subject: Re: [BioC]
> countMatches()
> (was:
> table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013 at
> 4:32 PM,
> Michael
> Lawrence
>
> .<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.>____com
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>>
>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>>.__>____com
>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.>____com
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>>>> wrote:
> .> The change to the
> behavior of
> %in% is a
> pretty big
> one. Are you
> thinking
> .> that all set-based
> operations should
> behave this way? For
> example, setdiff
> .> and intersect? I really
> liked
> the syntax
> of "peaks
> %in% genes".
> In my
> .> experience, it's way
> more common
> to ask
> questions
> about overlap
> than about
> .> equality, so I'd rather
> optimize
> the API
> for that use
> case. But
> again,
> .> that's just my personal
> bias.
> .
> .For what it is worth, I share
> Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013 at
> 1:11 PM,
> Hervé Pagès
> <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>>>
> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added findMatches() and
> countMatches()
> to the
> latest IRanges /
> .>> GenomicRanges packages
> (in BioC
> devel only).
> .>>
> .>> findMatches(x,
> table): An
> enhanced
> version of
> ‘match’ that
> .>> returns all the
> matches in a
> Hits object.
> .>>
> .>> countMatches(x, table):
> Returns an
> integer vector
> of the length
> .>> of ‘x’,
> containing
> the number
> of matches in
> ‘table’ for
> .>> each element
> in ‘x’.
> .>>
>
> .>> countMatches() is what
> you can
> use to
> tally/count/tabulate
> (choose your
>
> .>> preferred term) the unique
> elements in a
> GRanges object:
> .>>
> .>> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <- GRanges("chr1",
>
>
> IRanges(sample(15,20,replace=*______*TRUE),
>
>
>
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <-
> sort(unique(gr))
> .>> >
> countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2 2 1
> 2 2 2
> .>>
> .>> Note that
> findMatches() and
> countMatches() also work on
> IRanges and
> .>> DNAStringSet objects,
> as well as on
> ordinary atomic
> vectors:
> .>>
> .>> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <-
> DNAStringSet(hgu95av2probe)
> .>> unique_probes <-
> unique(probes)
> .>> count <-
> countMatches(unique_probes,
> probes)
> .>> max(count) # 7
> .>>
> .>> I made other changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match" between
> elements of a
> vector-like object now
> consistently
> .>> means "equality"
> instead of
> "overlap",
> even for
> range-based
> objects
> .>> like IRanges or GRanges
> objects. This
> notion of
> "equality" is the
> .>> same that is used by
> ==. The most
> visible consequence
> of those
> .>> changes is that using %in%
> between 2
> IRanges or
> GRanges objects
> .>> 'query' and 'subject'
> in order
> to do
> overlaps was
> replaced by
> .>> overlapsAny(query,
> subject).
> .>>
> .>> overlapsAny(query,
> subject):
> Finds the
> ranges in
> ‘query’ that
> .>> overlap any of
> the ranges
> in ‘subject’.
> .>>
>
> .>> There are warnings and
> deprecation
> messages in place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in
> Computational Biology
> .>> Division of Public
> Health Sciences
> .>> Fred Hutchinson Cancer
> Research
> Center
> .>> 1100 Fairview Ave. N,
> M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail:
> hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>>
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>>
>
> .>> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> .>> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> .>>
> .>
> .> [[alternative HTML
> version deleted]]
> .>
> .>
> .>
>
>
> _____________________________________________________
>
>
>
>
> .> Bioconductor mailing list
> .>
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>
>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>>
>
>
> <mailto:Bioconductor at r-______project.org
> <mailto:Bioconductor at r-______project.org>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>>
>
>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>>
>
>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>>>
>
> .>
>
> https://stat.ethz.ch/mailman/______listinfo/bioconductor
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor>
>
>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>>
>
>
>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>>>
> .> Search the archives:
>
> http://news.gmane.org/gmane.______science.biology.informatics.______conductor
>
> <http://news.gmane.org/gmane.____science.biology.informatics.____conductor>
>
>
>
>
> <http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>>
>
>
>
>
> <http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>>>
> .
>
> ._____________________________________________________
>
>
>
>
> .Bioconductor mailing list
>
> .Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>
>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>>
>
>
> <mailto:Bioconductor at r-______project.org
> <mailto:Bioconductor at r-______project.org>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>>
>
>
>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>>>
>
>
> .https://stat.ethz.ch/mailman/______listinfo/bioconductor
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>>
>
>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>>>
> .Search the archives:
>
> http://news.gmane.org/gmane.______science.biology.informatics.______conductor
>
> <http://news.gmane.org/gmane.____science.biology.informatics.____conductor>
>
>
> <http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>>
>
>
>
>
> <http://news.gmane.org/gmane.____science.biology.informatics.____conductor
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor>
>
>
> <http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>
> <http://news.gmane.org/gmane.science.biology.informatics.conductor>>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>
>
> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
>
> Fax: (206) 667-1319
> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
> Phone: (206) 667-5791
> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
>
>
>
>
> --
>
> /A model is a lie that helps you see the truth./
> /
> /
> Howard Skipper
>
>
> <http://cancerres.__aacrjournals.org/content/31/9/__1173.full.pdf <http://aacrjournals.org/content/31/9/__1173.full.pdf>
> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>>
>
>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
>
> ...
>
> [Message clipped]
>
>
>
> --
> /A model is a lie that helps you see the truth./
>
> Howard Skipper
> <http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list