[BioC] function precede() not working with GRanges

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 12 04:20:17 CET 2011


Hi,

I'm not Michael, but:

> How general is the "peaks do not have a direction" statement?
> In my experience (RNA-seq experiments), peaks are "stranded features"
> i.e. they belong to a particular strand.

I'm actually a bit surprised by that. It was my impression that most
of the rna-seq data to date have been done with a protocol that
doesn't honor the strand of the reads.

Recently, I've come across several protocols where strandedness is
maintained and I'm sure that these will be increasingly the norm as
time goes on.

I've been working on/with different variants of some (rna)
tag-sequencing protocols, and these do typically keep strand info, but
I think the "old" Illumina-driven transcriptome-wide rna-seq data that
"most" people think of as RNA-seq is unstranded.

Still. I like that GenomicRanges can honor strandedness in
overlap/etc. functions when the strand is set.

>> One thing that often confuses me with GenomicRanges is the dual meaning of
>> strand. It seems to indicate both a direction and a coordinate. It makes
>> sense to me to have resize() and precede() consider strand as a direction.
>> But findOverlaps() treats strands as coordinates such that items on
>> different strands do not overlap. This makes less sense.
>
> Why? IMO a positive read (i.e. a read that was aligned to the plus
> strand) shouldn't hit features (genes/transcripts/exons) that are
> located on the minus strand. If for whatever reason this is not what
> the user wants, then s/he can always set the strand of the query and
> subject to '*' but I wouldn't say this is the typical usecase.
>
>> Another good
>> example is gaps() which will yield sequence-long ranges on the strands not
>> represented in the original object. This is very often undesirable. But of
>> course my perspective is limited.
>
> I agree that the behaviour of gaps() on GRanges is awkward. You not only
> get those sequence-long ranges on the strands not represented in the
> original object but you also get them for sequences not represented at
> all (as long as they are listed in levels(seqnames(x))). But if gaps()
> wasn't doing this, gaps() wouldn't be reversible anymore (i.e.
> 'gaps(gaps(x))' wouldn't be indentical to 'x') which is kind of an
> important property for gaps(). In particular, gaps(x) would give
> the same result whether 'x' as a sequence-long range on one strand
> or not (very unlikely to happen with real data though).
>
> Maybe an extra arg to gaps() would help control this? Suggestions are
> welcome. I'll make the change. Thanks!

I'd be in favor of adding a param to GRanges::gaps() that would allow
me to explicitly turn off this "feature" ;-)

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list