[BioC] GRanges setdiff With Features on Both Strands
Cook, Malcolm
MEC at stowers.org
Mon Jan 23 23:13:11 CET 2012
I'm sold. Excellent compromise. Thanks for considering my opinion.....
Cheers!
~Malcolm
> -----Original Message-----
> From: Hervé Pagès [mailto:hpages at fhcrc.org]
> Sent: Monday, January 23, 2012 2:33 PM
> To: Michael Lawrence
> Cc: Cook, Malcolm; D.Strbenac at garvan.org.au; bioconductor at r-project.org
> Subject: Re: [BioC] GRanges setdiff With Features on Both Strands
>
> Hi Michael, Malcolm,
>
> On 01/23/2012 11:14 AM, Michael Lawrence wrote:
> >
> >
> > On Mon, Jan 23, 2012 at 7:12 AM, Cook, Malcolm <MEC at stowers.org
> > <mailto:MEC at stowers.org>> wrote:
> >
> > > > As for the weird behavior you get with ignore.strand=TRUE, I agree
> > > > it doesn't make much sense and it's not totally clear to me
> > what the
> > > > correct behavior should be. Sadly the man page doesn't say anything
> > > > about what the intended behavior is (there is only a brief note
> > about
> > > > what ignore.strand=TRUE does for parallel set operations), gives no
> > > > example, and this case is not covered by the unit tests either.
> > > >
> > > > However, by looking at what union() and gaps() do with
> > > > ignore.strand=TRUE, and also at the internals of the GenomicRanges
> > > > package, it seems to me that the intention was that:
> > > >
> > > > union(x, y, ignore.strand=TRUE)
> > > > intersect(x, y, ignore.strand=TRUE)
> > > > setdiff(x, y, ignore.strand=TRUE)
> > > >
> > > > would be equivalent to:
> > > >
> > > > union(`strand<-`(x, "+"), `strand<-`(y, "+"))
> > > > intersect(`strand<-`(x, "+"), `strand<-`(y, "+"))
> > > > setdiff(`strand<-`(x, "+"), `strand<-`(y, "+"))
> > > >
> > > > So I could fix setdiff() and intersect() to do this
> > (intersect() has
> > > > a weird behavior too), and update the documentation. Does that
> > sound
> > > > reasonable?
> > > >
> > > >
> > > This would be reasonable to me.
> >
> > I think that retaining `ignore.strand` with the proposed semantics
> > of (potentially) altering the strand is inviting further confusion.
> >
> > For example, it would introduce the possibility that the union of x
> > with itself is not all.equal to itself only in the case when the
> > strand of x is not '+'.
> >
> > I think this is a markedly poor outcome since `ignore.strand` is
> > clearly NOT what the operation would be doing.
> >
> >
> > I would expect that a set operation ignoring strand would generate
> > ranges with "*" as the strand. There's no way to reliably preserve the
> > strand. So I don't see how the original strand would matter.
>
> I think that Malcom's point is that it's an unpleasant surprise that
> some very fundamental and intuitive mathematical properties are not
> honored, e.g. the union (or intersection) of x with itself is not
> itself. I can live with that one though: this happens only if you
> use ignore.strand=TRUE, which is not the default. And we still have
> the property that the union (or intersection) of x with itself is
> itself *modulo* the strand. Probably good enough.
>
> Now whether
>
> union(x, y, ignore.strand=TRUE)
> intersect(x, y, ignore.strand=TRUE)
> setdiff(x, y, ignore.strand=TRUE)
>
> should be equivalent to
>
> union(`strand<-`(x, "+"), `strand<-`(y, "+"))
> intersect(`strand<-`(x, "+"), `strand<-`(y, "+"))
> setdiff(`strand<-`(x, "+"), `strand<-`(y, "+"))
>
> or to
>
> union(`strand<-`(x, "*"), `strand<-`(y, "*"))
> intersect(`strand<-`(x, "*"), `strand<-`(y, "*"))
> setdiff(`strand<-`(x, "*"), `strand<-`(y, "*"))
>
> is just a cosmetic question but I agree that the output of the latter
> will look prettier (and might be slightly less confusing). So I can do
> this. Does anybody object?
>
> >
> > > > Finally I'd like to mention that I see very little value in
> > supporting
> > > > the 'ignore.strand' arg in those 3 methods so another option
> > would be
> > > > to just get rid of it.
> > > >
> > > >
> > > I think there is value here. Often, one wants to ignore the
> > strand in these
> > > operations (say between reads and transcripts) without destroying the
> > > strand information. Creating temporary objects is a nuisance.
> >
> > Perhaps a compromise then.....
> >
> > Since the implementation appears to have been undocumented and the
> > intention unclear, how about replace `ignore.strand` option with a
> > `with.strand=c('+','-','*')` option which sets the GRanges strand
> > prior to performing the operation.
> >
> >
> >
> > I would prefer keeping consistent with the rest of the API and having
> > ignore.strand simply be the equivalent of with.strand="*". I
> > acknowledge that the meaning of "*" is ambiguous, but it's easily
> > preferred over "+" and "-".
> >
>
> Yes a lot of methods already have the 'ignore.strand' arg so if we're
> going to keep that feature for union(), intersect() and setdiff(), I'd
> rather stick to that API too.
>
> Cheers,
> H.
More information about the Bioconductor
mailing list