[BioC] countMatches() (was: table for GenomicRanges)
Hervé Pagès
hpages at fhcrc.org
Wed Jan 9 00:46:27 CET 2013
Thanks all for the feedback. Will do %over% and %within%. Hopefully we
can consider this is the end of the thread :-b I'll just post a quick
note on Bioc-devel when this is ready.
Cheers,
H.
On 01/08/2013 03:07 PM, Michael Lawrence wrote:
> I think %over% and maybe %within% are all that's needed. Could go to
> %start% and %end%.
>
> Michael
>
>
>
>
>
> On Tue, Jan 8, 2013 at 2:59 PM, Cook, Malcolm <MEC at stowers.org
> <mailto:MEC at stowers.org>> wrote:
>
> If we’re voting/brainstorming, I’d go for one operator for value
> that the ‘type’ arg of overlap can take on____
>
> __ __
>
> Thus:____
>
> __ __
>
> %olStart%____
>
> %olEnd%____
>
> %olWithin%____
>
> %olAny% (perhaps with alias of just ‘%ol%’)____
>
> %olEqual% (which should be same as %in%, right)____
>
> __ __
>
> Doh, I can’t stay away from this issue for some reason..... Anyway,
> my 2 cents____
>
> __ __
>
> ~Malcolm____
>
> __ __
>
> *From:*Tim Triche, Jr. [mailto:tim.triche at gmail.com
> <mailto:tim.triche at gmail.com>]
> *Sent:* Tuesday, January 08, 2013 4:12 PM
> *To:* Michael Lawrence
> *Cc:* Hervé Pagès; Cook, Malcolm; Sean Davis; Vedran Franke;
> bioconductor at r-project.org <mailto:bioconductor at r-project.org>
> *Subject:* Re: [BioC] countMatches() (was: table for GenomicRanges)____
>
> __ __
>
> Michael: your suggestion is both clearer and more concise than mine
> was. +1 ____
>
> __ __
>
> (I prefer x %i% y %i% z rather than intersect(x, intersect(y, z))
> for the same reason)____
>
> __ __
>
> __ __
>
> __ __
>
> __ __
>
> On Tue, Jan 8, 2013 at 2:03 PM, Michael Lawrence
> <lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>>
> wrote:____
>
> I would vote for %over% instead of %ov%. Just 2 more characters but
> way clearer, at least to me. The hardest thing to type are the %'s.
>
> Michael____
>
> __ __
>
> On Tue, Jan 8, 2013 at 11:09 AM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:____
>
> Thanks Tim, Malcolm for the feedback.
>
> @Tim, I won't comment on the variants of %ov% you are proposing for
> doing "within" or "equal" instead of "any" (but if people want them,
> I'll add them too). For now I just want to focus on restoring the
> convenience of the old %in%, whose removal is understandably causing
> some frustration. And so we can move on.
>
> Cheers,
> H.____
>
>
>
>
> On 01/08/2013 09:50 AM, Tim Triche, Jr. wrote:____
>
> hell, I'll add the operators if there's support for them.
> obviously
> they're not a big deal and a patch would take 5 minutes flat.
>
> my hope was to be very explicit about what each type of
> operation meant,
> so that when a newcomer to the Ranges API sees
>
> peaks %overlapping% promoters(someGroupOfGenesWeCareAbout)
>
> it cannot be confused with
>
> peaks %within% rangesThatCorrespondToSomeChromatinState
>
> or
>
> peaks %equal% aBunchOfDNAseFootprints
>
> or
>
> DMRs %in% genes ## what the hell does this really mean,
> anyways?
> it's so bad on so many levels
>
> because whenever someone says "what is the advantage of
> Ranges-based
> analyses?", these are the archetypal sorts of queries that
> come to mind.
> Except that usually in my examples they are based on
> posterior
> probabilities, but perhaps that could stand to change.
>
> Anyways, that's just my bias, and you're doing the heavy
> lifting. But
> if people agree with the motivations I will write the patch
> today.
>
> Cheers,
>
> --t
>
>
>
>
> On Tue, Jan 8, 2013 at 9:20 AM, Hervé Pagès
> <hpages at fhcrc.org <mailto:hpages at fhcrc.org>____
>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
> Hi Tim,
>
> I could add the %ov% operator as a replacement for the
> old %in%. So you
> would write 'peaks %ov% genes' instead of 'peaks %in%
> genes'. Would just
> be a convenience wrapper for 'overlapsAny(peaks, genes)'.
>
> Cheers,
> H.
>
>
> On 01/07/2013 11:45 AM, Tim Triche, Jr. wrote:
>
> So why not leave %in% as it was and transition
> everything forward to
> explicitly using { `%within%`,
> `%overlaps%`|`%overlapping%`,
> `%equals%`
> } such that
>
> identical( x %within% table, countOverlaps(x,
> table,
> type='within') >
> 0 ) == TRUE
> identical( x %overlaps% table, countOverlaps(x,
> table,
> type='any') >
> 0 ) == TRUE
> identical( x %equals% table, countOverlaps(x,
> table,
> type='equal') >
> 0 ) == TRUE
>
> and for the time being,
>
> identical( x %overlaps% table, countOverlaps(x,
> table,
> type='any') >
> 0 ) == TRUE ## but with a noisy nastygram that will
> halt if
> options("warn"=2)
> No breakage for %in% methods until such time as a full
> deprecation cycle
> has passed, and if the maintainers can't be arsed
> to do anything
> at all
> about the warnings by the second full release, then
> perhaps they
> don't
> really care that much after all. Just a thought?
>
> From someone (me) who has their own issues with
> keeping
> everything up
> to date and should know better. If you want to use
> %in% for
>
> peaks %in% genes (why on earth would you do
> this rather than
> peaks
> %in% promoters(genes), anyways?)
>
> then a nastygram could be emitted "WARNING: YOUR
> SHORTHAND
> NOTATION IS
> DOOMED AFTER BIOC 2.13, YOU WILL BE ASSIMILATED"
> and everyone is
> (more
> or less) happy.
>
>
>
> On Mon, Jan 7, 2013 at 11:33 AM, Michael Lawrence
> <lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>____
>
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>> wrote:
>
>
>
> ____
>
> On Mon, Jan 7, 2013 at 11:00 AM, Hervé Pagès
> <hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>____
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>> wrote:
>
> Hi Michael,
>
> I don't think "match" (the word) always
> has to mean
> "equality"
> either.
> However having match() (the function) do
> "whole exact
> matching" (aka
> "equality") for any kind of vector-like
> object has the
> advantage of:
>
> (a) making it consistent with base::match()
> (?base::match is
> pretty
> explicit about what the contract of
> match() is)
>
>
> (a) alone is obviously not enough. We have
> many methods,
> like the
> set operations, that treat ranges specially.
> Are we going
> to start
> moving everything toward the base behavior?
> And have
> rangeIntersect,
> rangeSetdiff, etc?
>
> (b) preserving its relationship with ==,
> duplicated(), unique(),
> etc...
>
>
> So it becomes consistent with
> duplicated/unique, but we lose
> consistency with the set operations.
>
> (c) not frustrating the user who needs
> something to
> do exact
> matching on ranges (as I mentioned
> previously,
> if you take
> match() away from him/her, s/he'll
> be left with
> nothing).
>
>
> No one has ever asked for match() to behave
> this way. There
> was a
> request for a way to tabulate identical
> ranges. It was a
> nice idea
> to extract the general "outer equal"
> findMatches function.
> But the
> changes seem to be snow-balling. These types
> of changes
> mean a lot
> of maintenance work for the users. A
> deprecation cycle does not
> circumvent that.
>
>
> IMO those advantages counterbalance *by
> far* the very
> little
> convenience you get from having
> 'match(query, subject)' do
> 'findOverlaps(query, subject,
> select="first")' on
> IRanges/GRanges objects. If you need to do
> that, just
> use the
> latter, or, if you think that's still too
> much typing,
> define
> a wrapper e.g. 'ovmatch(query, subject)'.
>
> There are plenty of specialized tools
> around for doing
> inexact/fuzzy/partial/overlap matching for
> many
> particular types
> of vector-like objects: grep() and family,
> pmatch(),
> charmatch(),
> agrep(), grepRaw(), matchPattern() and family,
> findOverlaps() and
> family, findIntervals(), etc... For the
> reasons I mentioned
> above, none of them should hijack match()
> to make it do
> some
> particular type of inexact matching on
> some particular
> type of
> objects. Even if, for that particular type
> of objects,
> doing that
> particular type of inexact matching is
> more common than
> doing
> exact matching.
>
> H.
>
>
>
> On 01/06/2013 05:39 PM, Michael Lawrence
> wrote:
>
> I think having overlapsAny is a nice
> addition and
> helps make
> the API
> more complete and explicit. Are you
> sure we need to
> change
> the behavior
> of the match method for this
> relatively uncommon
> use case?
>
>
> Yes because otherwise users with a use
> case of doing
> match()
>
> even if it's uncommon,
>
>
> I don't think
> "match" always has to mean "equality".
> It is a more
> general
> concept in
> my mind. The most common use case for
> matching
> ranges is
> overlap.
>
>
> Of course "match" doesn't always have to
> mean equality.
> But of base
>
>
> Michael
>
>
> On Fri, Jan 4, 2013 at 8:34 PM, Hervé
> Pagès
> <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>____
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>>
> wrote:____
>
> Yes 'peaks %in% genes' is cute
> and was
> probably doing
> the right thing
> for most users (although not
> all). But 'exons %in%
> genes' is cute too
> and was probably doing the wrong
> thing for
> all users.
> Advanced users
> like you guys would have no
> problem switching to
>
> !is.na <http://is.na>
> <http://is.na> <http://is.na>____
>
> <http://is.na>(findOverlaps(____peaks,
> genes,____
>
>
> type="within",
>
> select="any"))
>
> or
>
> !is.na <http://is.na>
> <http://is.na> <http://is.na>____
>
> <http://is.na>(findOverlaps(____peaks,
> genes,____
>
>
> type="equal",
>
>
> select="any"))
>
> in case 'peaks %in% genes' was
> not doing
> exactly what
> you wanted,
> but most users would not find
> this particularly
> friendly. Even
> worse, some users probably didn't
> realize that
> 'peaks
> %in% genes'
> was not doing exactly what they
> thought it did
> because
> "peaks in
> genes" in English suggests that
> the peaks are
> within
> the genes,
> but it's not what 'peaks %in%
> genes' does.
>
> Having overlapsAny(), with
> exactly the same extra
> arguments as
> countOverlaps() and
> subsetByOverlaps() (i.e.
> 'maxgap',
> 'minoverlap',
> 'type', 'ignore.strand'), all of them
> documented (and
> with most
> users more or less familiar with
> them already)
> has the
> virtue to
> expose the user to all the
> options from the
> very start,
> and to
> help him/her make the right
> choice. Of course
> there
> will be users
> that don't want or don't have the
> time to
> read/think
> about all the
> options. Not a big deal: they'll
> just do
> 'overlapsAny(query, subject)',
> which is not a lot more typing
> than 'query %in%
> subject', especially
> if they use tab completion.
>
> It's true that it's more common
> to ask
> questions about
> overlap than
> about equality but there are some
> use cases
> for the
> latter (as the
> original thread shows). Until
> now, when you
> had such a
> use case, you
> could not use match() or %in%,
> which would
> have been
> the natural things
> to use, because they got hijacked
> to do
> something else,
> and you were
> left with nothing. Not a
> satisfying situation.
> So at a
> minimum, we
> needed to restore the
> true/real/original
> semantic of
> match() to do
> "equality" instead of "overlap".
> But it's hard
> to do
> this for match()
> and not do it for %in% too. For
> more than 99% of R
> users, %in% is
> just a simple wrapper for
> 'match(x, table,
> nomatch = 0)
> > 0' (this
> is how it has been documented and
> implemented
> in base R
> for many
> years). Not maintaining this
> relationship
> between %in%
> and match()
> would only cause grief and
> frustration to
> newcomers to
> Bioconductor.
>
> H.
>
>
>
> On 01/04/2013 03:32 PM, Cook,
> Malcolm wrote:
>
> Hiya again,
>
> I am definitely a late comer
> to BioC, so I
> definitely easily
> defer to
> the tide of history.
>
> But I do think you miss my
> point Michael
> about the
> proposed change
> making the relationship
> between %in% and
> match for
> {G,I}Ranges{List}
> mimic that between other
> vectors, and I do
> think
> that changing
> the API
> would make other late-comers
> take to BioC
> easier/faster.
>
> That said, I NEVER use %in%
> so I really
> have no
> stake in the
> matter, and
> I DEFINITELY appreciate the
> argument to not
> changing the API
> just for
> sematic sweetness.
>
> That that said, Herve is _/so
> good/_ about
> deprecations and warnings
>
> that make such changes fairly
> easily
> digestible.
>
> That that that.... enough....
> I bow out of
> this
> one....!!!!
>
> Always learning and Happy New
> Year to all
> lurkers,
>
> ~Malcolm
>
> *From:*Michael Lawrence____
>
> [mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>>.__>____com
>
> ____
>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.>____com
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>>]
> *Sent:* Friday, January 04,
> 2013 5:11 PM
> *To:* Cook, Malcolm
> *Cc:* Sean Davis; Michael
> Lawrence; Hervé
> Pagès
> (hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>____
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>____
>
>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>); Tim
>
>
>
> Triche, Jr.; Vedran Franke;
> bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>____
>
>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>>
>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>>____
>
> *Subject:* Re: [BioC]
> countMatches() (was:
> table
> for GenomicRanges)
>
>
> On Fri, Jan 4, 2013 at 1:56
> PM, Cook, Malcolm
> <MEC at stowers.org
> <mailto:MEC at stowers.org> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org>>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>
> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>>
> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>
> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org> <mailto:MEC at stowers.org
> <mailto:MEC at stowers.org>>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>
> <mailto:MEC at stowers.org <mailto:MEC at stowers.org>>>>>> wrote:
>
> Hiya,
>
> For what it is worth...
>
> I think the change to %in% is
> warranted.
>
> If I understand correctly,
> this change
> restores the
> relationship
> between
> the semantics of `%in` and
> the semantics
> of `match`.
>
> From the docs:
>
> '"%in%" <- function(x,
> table) match(x,
> table,
> nomatch = 0) > 0'
>
> Herve's change restores this
> relationship.
>
>
> match and %in% were initially
> consistent (both
> considering any
> overlap);
> Herve has changed both of
> them together.
> The whole
> idea behind
> IRanges
> is that ranges are special
> data types with
> special
> semantics. We
> have
> reimplemented much of the
> existing R
> vector API
> using those
> semantics;
> this extends beyond
> match/%in%. I am
> hesitant about
> making such
> sweeping
> changes to the API so late in the
> life-cycle of the
> package.
> There was a
> feature request for a way to
> count
> identical ranges
> in a set of
> ranges.
> Let's please not get carried
> away and start
> redesigning the API
> for this
> one, albeit useful, request.
> There are all
> sorts of
> inconsistencies in
> the API, and many of them
> were conscious
> decisions
> that considered
> practical use cases.
>
> Michael
>
>
> Herve, I suspect you
> were you as a
> result able to
> completely drop
> all the
> `%in%,BiocClass1,BiocClass2`
> definitions and depend
> upon
> base::%in%
>
> Am I right?
>
> If so, may I suggest
> that Herve stay the
> course, with the
> addition of
> '"%ol%" <- function(a, b)
> findOverlaps(a,
> b, maxgap=0L,
> minoverlap=1L, type='any',
> select='all') > 0'
>
> This would provide a
> perspicacious
> idiom, thereby
> optimizing the API
> for Michaels observed
> common use case.
>
> Just sayin'
>
> ~Malcolm
>
>
> .-----Original
> Message-----
> .From:____
>
> bioconductor-bounces at r-______project.org
> <mailto:bioconductor-bounces at r-______project.org>
> <mailto:bioconductor-bounces at r-____project.org
> <mailto:bioconductor-bounces at r-____project.org>>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-__project.org
> <http://r-__project.org>
> <mailto:bioconductor-bounces at r-__project.org
> <mailto:bioconductor-bounces at r-__project.org>>>____
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-project.org
> <http://r-project.org>
> <mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at r-project.org>>>>
>
>
> <mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>>______r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <http://r-project.org>____
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-project.org
> <http://r-project.org>
> <mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at r-project.org>>>>>
>
> [mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>>______r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <http://r-project.org>____
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-project.org
> <http://r-project.org>
> <mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at r-project.org>>>>
>
>
> <mailto:bioconductor-bounces@ <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>>______r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <http://r-project.org>____
>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>>____r-project.org
> <http://r-project.org>
> <http://r-project.org>
> <mailto:bioconductor-bounces@
> <mailto:bioconductor-bounces@>__r-project.org
> <http://r-project.org>
> <mailto:bioconductor-bounces at r-project.org
> <mailto:bioconductor-bounces at r-project.org>>>>>] On Behalf
> Of Sean
> Davis
> .Sent: Friday, January
> 04, 2013 3:37 PM
> .To: Michael Lawrence
> .Cc: Tim Triche, Jr.;
> Vedran Franke;
> bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>>____
>
> <mailto:bioconductor at r-______project.org
> <mailto:bioconductor at r-______project.org>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>>____
>
>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>>>
>
>
>
> <mailto:bioconductor at r-____project.org
> <mailto:bioconductor at r-____project.org>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>>
> <mailto:bioconductor at r-__project.org
> <mailto:bioconductor at r-__project.org>
> <mailto:bioconductor at r-project.org
> <mailto:bioconductor at r-project.org>>>>>
>
> .Subject: Re: [BioC]
> countMatches()
> (was:
> table for
> GenomicRanges)
> .
> .On Fri, Jan 4, 2013
> at 4:32 PM,
> Michael
> Lawrence
>
> .<lawrence.michael at gene.com <mailto:lawrence.michael at gene.com>
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.>____com
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>>____
>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>>.__>____com____
>
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.
> <mailto:lawrence.michael at gene
> <mailto:lawrence.michael at gene>.>____com
> <mailto:lawrence.michael at gene.
> <mailto:lawrence.michael at gene.>__com
> <mailto:lawrence.michael at gene.com
> <mailto:lawrence.michael at gene.com>>>>>> wrote:
> .> The change to the
> behavior of
> %in% is a
> pretty big
> one. Are you
> thinking
> .> that all set-based
> operations should
> behave this way? For
> example, setdiff
> .> and intersect? I
> really liked
> the syntax
> of "peaks
> %in% genes".
> In my
> .> experience, it's
> way more common
> to ask
> questions
> about overlap
> than about
> .> equality, so I'd
> rather optimize
> the API
> for that use
> case. But
> again,
> .> that's just my
> personal bias.
> .
> .For what it is worth,
> I share
> Michael's
> personal bias here.
> .
> .Sean
> .
> .
> .> Michael
> .>
> .>
> .> On Fri, Jan 4, 2013
> at 1:11 PM,
> Hervé Pagès
> <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>>>> wrote:
> .>
> .>> Hi,
> .>>
> .>> I added
> findMatches() and
> countMatches()
> to the
> latest IRanges /
> .>> GenomicRanges
> packages (in BioC
> devel only).
> .>>
> .>> findMatches(x,
> table): An
> enhanced
> version of
> ‘match’ that
> .>> returns
> all the
> matches in a
> Hits object.
> .>>
> .>> countMatches(x,
> table):
> Returns an
> integer vector
> of the length
> .>> of ‘x’,
> containing
> the number
> of matches in
> ‘table’ for
> .>> each
> element in ‘x’.
> .>>
>
> .>> countMatches() is
> what you can
> use to
> tally/count/tabulate
> (choose your
>
> .>> preferred term)
> the unique
> elements in a
> GRanges object:
> .>>
> .>>
> library(GenomicRanges)
> .>> set.seed(33)
> .>> gr <-
> GRanges("chr1",____
>
>
> IRanges(sample(15,20,replace=*______*TRUE),____
>
>
>
>
> width=5))
> .>>
> .>> Then:
> .>>
> .>> > gr_levels <-
> sort(unique(gr))
> .>> >
> countMatches(gr_levels, gr)
> .>> [1] 1 1 1 2 4 2
> 2 1 2 2 2
> .>>
> .>> Note that
> findMatches() and
> countMatches() also work on
> IRanges and
> .>> DNAStringSet
> objects, as well as on
> ordinary atomic
> vectors:
> .>>
> .>>
> library(hgu95av2probe)
> .>> library(Biostrings)
> .>> probes <-
> DNAStringSet(hgu95av2probe)
> .>> unique_probes <-
> unique(probes)
> .>> count <-
> countMatches(unique_probes,
> probes)
> .>> max(count) # 7
> .>>
> .>> I made other
> changes in
> IRanges/GenomicRanges so that
> the notion
> .>> of "match" between
> elements of a
> vector-like object now
> consistently
> .>> means "equality"
> instead of
> "overlap",
> even for
> range-based
> objects
> .>> like IRanges or
> GRanges
> objects. This
> notion of
> "equality" is the
> .>> same that is used
> by ==. The most
> visible consequence
> of those
> .>> changes is that
> using %in%
> between 2
> IRanges or
> GRanges objects
> .>> 'query' and
> 'subject' in order
> to do
> overlaps was
> replaced by
> .>> overlapsAny(query,
> subject).
> .>>
> .>>
> overlapsAny(query, subject):
> Finds the
> ranges in
> ‘query’ that
> .>> overlap any
> of the ranges
> in ‘subject’.
> .>>
>
> .>> There are warnings
> and deprecation
> messages in place
> to help
> smooth
>
> .>> the transition.
> .>>
> .>> Cheers,
> .>> H.
> .>>
> .>> --
> .>> Hervé Pagès
> .>>
> .>> Program in
> Computational Biology
> .>> Division of Public
> Health Sciences
> .>> Fred Hutchinson
> Cancer Research
> Center
> .>> 1100 Fairview Ave.
> N, M1-B514
> .>> P.O. Box 19024
> .>> Seattle, WA 98109-1024
> .>>
> .>> E-mail:
> hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>>>____
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>
>
> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org> <mailto:hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>>____
>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>
> <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>>>>
>
> .>> Phone: (206)
> 667-5791 <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> <tel:%28206%29%20667-5791>
> .>> Fax: (206)
> 667-1319 <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
> <tel:%28206%29%20667-1319>
>
> .>>
> .>
> .>
> [[alternative HTML
> version deleted]]
> .>
> .>
> .>____
>
>
> _________________________________________________________
>
>
>
>
> .> Bioconductor
> mailing list
> .>
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>
>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>>____
>
>
> <mailto:Bioconductor at r-______project.org
> <mailto:Bioconductor at r-______project.org>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>>____
>
>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>>
>
>
> <mailto:Bioconductor at r-____project.org
> <mailto:Bioconductor at r-____project.org>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>>
> <mailto:Bioconductor at r-__project.org
> <mailto:Bioconductor at r-__project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>>>>
>
> .>____
>
> https://stat.ethz.ch/mailman/______listinfo/bioconductor
>
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor>____
>
>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>>
>
>
>
>
> <https://stat.ethz.ch/mailman/____listinfo/bioconductor
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor>
>
> <https://stat.ethz.ch/mailman/__listinfo/bioconductor
> <https://stat.ethz.ch/mailman/listinfo/bioconductor>>>
> .> Search the
> archives:____
>
> <http://news.gmane.org/gmane.______science.biology.informatics.______conductor>
>
> ...
>
> [Message clipped]
>
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list