[BioC] Order within a GRanges object

Hervé Pagès hpages at fhcrc.org
Tue Aug 20 23:21:27 CEST 2013


Hi Malcolm, Hermann,

On 08/20/2013 06:05 AM, Cook, Malcolm wrote:
>> Hello,
>   >
>   >I have some points according to the internal order of granges objects.
>   >
>   >1) Automatically there is an order depending on the a) seqnames (=
>   >chromosomes) and b) the ranges.
>
> no!   There is no gaurantee on the order.
>
>> library(GenomicRanges)
>> example(GRanges)
> ...
>> longGR
> GRanges with 30 ranges and 1 metadata column:
>        seqnames     ranges strand   |     score
>           <Rle>  <IRanges>  <Rle>   | <integer>
>      a     chr1    [1, 10]      -   |         1
>      b     chr2    [2, 10]      +   |         2
>      c     chr2    [3, 10]      +   |         3
>      d     chr2    [4, 10]      *   |         4
>      e     chr1    [5, 10]      *   |         5
>    ...      ...        ...    ... ...       ...
>            chr2 [106, 115]      -   |        26
>            chr2 [107, 116]      -   |        27
>            chr3 [108, 117]      -   |        28
>            chr3 [109, 118]      -   |        29
>            chr3 [110, 119]      -   |        30
>    ---
>    seqlengths:
>     chr1 chr2 chr3
>     1000 2000 1500
>>   rev(longGR)
> GRanges with 30 ranges and 1 metadata column:
>        seqnames     ranges strand   |     score
>           <Rle>  <IRanges>  <Rle>   | <integer>
>            chr3 [110, 119]      -   |        30
>            chr3 [109, 118]      -   |        29
>            chr3 [108, 117]      -   |        28
>            chr2 [107, 116]      -   |        27
>            chr2 [106, 115]      -   |        26
>    ...      ...        ...    ... ...       ...
>      e     chr1    [5, 10]      *   |         5
>      d     chr2    [4, 10]      *   |         4
>      c     chr2    [3, 10]      +   |         3
>      b     chr2    [2, 10]      +   |         2
>      a     chr1    [1, 10]      -   |         1
>    ---
>    seqlengths:
>     chr1 chr2 chr3
>     1000 2000 1500
>>
>
>   >
>   >2) The seqnames are always sorted in ascii order.
>
> No!  but they _can_ be:
>
>> sort(longGR)
> GRanges with 30 ranges and 1 metadata column:
>        seqnames     ranges strand   |     score
>           <Rle>  <IRanges>  <Rle>   | <integer>
>      f     chr1    [6, 10]      +   |         6
>            chr1    [1,  5]      -   |       101
>      a     chr1    [1, 10]      -   |         1
>            chr1    [2,  6]      -   |       102
>            chr1    [3,  7]      -   |       103
>    ...      ...        ...    ... ...       ...
>      j     chr3 [ 10,  10]      -   |        10
>            chr3 [ 10,  14]      -   |       110
>            chr3 [108, 117]      -   |        28
>            chr3 [109, 118]      -   |        29
>            chr3 [110, 119]      -   |        30
>    ---
>    seqlengths:
>     chr1 chr2 chr3
>     1000 2000 1500

Just a small point of clarification. The ordering of the seqnames
in lexicographical order here is just a consequence of the fact that
the seqlevels are already ordered in lexicographical order. If you
change the order of the seqlevels first, then sort() will produce a
different result:

   seqlevels(longGR) <- seqlevels(longGR)[c(2,3,1)]

Then:

   > seqlevels(longGR)
   [1] "chr2" "chr3" "chr1"

   > sort(longGR)
   GRanges with 30 ranges and 1 metadata column:
         seqnames     ranges strand   |     score
            <Rle>  <IRanges>  <Rle>   | <integer>
       b     chr2    [2, 10]      +   |         2
       c     chr2    [3, 10]      +   |         3
             chr2    [4,  8]      -   |       104
             chr2    [5,  9]      -   |       105
             chr2    [6, 10]      -   |       106
     ...      ...        ...    ... ...       ...
             chr1 [  3,   7]      -   |       103
             chr1 [101, 110]      -   |        21
             chr1 [102, 111]      -   |        22
             chr1 [103, 112]      -   |        23
       e     chr1 [  5,  10]      *   |         5
     ---
     seqlengths:
      chr2 chr3 chr1
      2000 1500 1000

Cheers,
H.

>
>
> ~ Malcolm Cook
>
>
>   >
>   >3) After
>   >    df <- as.data.frame
>   >    m <- regexpr ("\\d+", df$seqnames, perl=TRUE)
>   >    df$Chromosome <- regmatches (df$seqnames, m)
>   >    df$Chromosome <- as.integer (as.character (df$Chromosome))
>   >    df <- df [order(df$Chromosome),]
>   >    only the order of the chromosomes is changed. The order of the ranges
>   >(now df$start and df$end) is still the same.
>   >
>   >Are my assumptions true?
>   >
>   >Thanks Hermann
>   >
>   >	[[alternative HTML version deleted]]
>   >
>   >_______________________________________________
>   >Bioconductor mailing list
>   >Bioconductor at r-project.org
>   >https://stat.ethz.ch/mailman/listinfo/bioconductor
>   >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list