[Bioc-devel] Strand-Awareness for Restrict Function

Hervé Pagès hpages at fredhutch.org
Thu Feb 18 20:51:40 CET 2016


On 02/18/2016 01:00 AM, Maintainer wrote:
> Hello,

Hi,

I'm putting this back to bioc-devel and setting the Subject to
"Strand-Awareness for Restrict Function" since this seems to be
a continuation of this thread (that you started here). I hope you
don't mind.

>
> Setdiff automatically reduces the resulting ranges. Is it intentional ?

Absolutely. The "vector-wise set operations" (i.e. union, intersect,
setdiff) treat 'x' and 'y' as 2 sets of genomic positions (or sets of
integers when 'x' and 'y' are Ranges objects) so duplicated positions
are removed.

> I can't see any formal definition of what setdiff is supposed to do, so I'm not sure. For example,
>
> setdiff(IRanges(1:10, width =1), IRanges(2, 4))
>
> reduces all of the adjacent ranges in the first variable. This is not documented.

The man page for set operations on Ranges and RangesList objects says:

   The 'union', 'intersect' and 'setdiff' methods for Ranges objects
   return a "normal" Ranges object representing the union, intersection
   and (asymmetric!) difference of the sets of integers represented by
   'x' and 'y'.

Admittedly this man page was too hard to find (you had to do
?`setdiff,Ranges,Ranges-method`, and ?`setdiff,GRanges,GRanges-method`
to get the man page for the methods for GRanges objects) and the above
explanation could probably be clearer. I've tried to remedy this (in
IRanges 2.5.32 and GenomicRanges 1.23.20) by adding aliases to these
man pages so they can be found with just ?setdiff (or ?union, or
?intersect).

>
> Also, it would be good if it had a strict.strand option

All the setops methods for genomic ranges have an 'ignore.strand'
option. Is that what you are asking for or do you have something
else in mind with 'strict.strand'?

> and an explanation of which variables the ... option accepts.
> Currently, it does not have any effect and the documentation of ... is ambiguous.
>
>> humanGenome
> GRanges object with 23 ranges and 0 metadata columns:
>         seqnames         ranges strand
>            <Rle>      <IRanges>  <Rle>
>     [1]     chr1 [1, 249250621]      *
>     [2]     chr2 [1, 243199373]      *
>     [3]     chr3 [1, 198022430]      *
>     [4]     chr4 [1, 191154276]      *
>     [5]     chr5 [1, 180915260]      *
>> humanGenes
> GRanges object with 22980 ranges and 1 metadata column:
>          seqnames                 ranges strand |     gene_id
>             <Rle>              <IRanges>  <Rle> | <character>
>        1    chr19 [ 58858172,  58874214]      - |           1
>       10     chr8 [ 18248755,  18258723]      + |          10
>      100    chr20 [ 43248163,  43280376]      - |         100
>     1000    chr18 [ 25530930,  25757445]      - |        1000
>    10000     chr1 [243651535, 244006886]      - |       10000
>> setdiff(humanGenome, humanGenes, strict.strand = FALSE)  # Has no effect because of transcripts' strands.
> GRanges object with 23 ranges and 0 metadata columns:
>         seqnames         ranges strand
>            <Rle>      <IRanges>  <Rle>
>     [1]     chr1 [1, 249250621]      *
>     [2]     chr2 [1, 243199373]      *
>     [3]     chr3 [1, 198022430]      *
>     [4]     chr4 [1, 191154276]      *
>     [5]     chr5 [1, 180915260]      *
>
> --------------------------------------

The ellipsis was actually not needed (and was not used) in the
setops methods for genomic ranges so I removed it in GenomicRanges
1.23.20.

Back to your original question: setdiff() is not the right tool for
the strand-aware trimming you're trying to achieve but it occurs to
me that psetdiff() (element-wise setdiff) might be a suitable
alternative (although you have some work to do to prepare an 'y'
that is parallel to 'x' and contains the regions to trim). I don't
think it's going to be easier/simpler than the restrict-based solution
I gave you earlier though.

H.

> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia
>
>
> ________________________________________________________________________
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list