[BioC] Order within a GRanges object
Hervé Pagès
hpages at fhcrc.org
Tue Aug 20 23:21:27 CEST 2013
Hi Malcolm, Hermann,
On 08/20/2013 06:05 AM, Cook, Malcolm wrote:
>> Hello,
> >
> >I have some points according to the internal order of granges objects.
> >
> >1) Automatically there is an order depending on the a) seqnames (=
> >chromosomes) and b) the ranges.
>
> no! There is no gaurantee on the order.
>
>> library(GenomicRanges)
>> example(GRanges)
> ...
>> longGR
> GRanges with 30 ranges and 1 metadata column:
> seqnames ranges strand | score
> <Rle> <IRanges> <Rle> | <integer>
> a chr1 [1, 10] - | 1
> b chr2 [2, 10] + | 2
> c chr2 [3, 10] + | 3
> d chr2 [4, 10] * | 4
> e chr1 [5, 10] * | 5
> ... ... ... ... ... ...
> chr2 [106, 115] - | 26
> chr2 [107, 116] - | 27
> chr3 [108, 117] - | 28
> chr3 [109, 118] - | 29
> chr3 [110, 119] - | 30
> ---
> seqlengths:
> chr1 chr2 chr3
> 1000 2000 1500
>> rev(longGR)
> GRanges with 30 ranges and 1 metadata column:
> seqnames ranges strand | score
> <Rle> <IRanges> <Rle> | <integer>
> chr3 [110, 119] - | 30
> chr3 [109, 118] - | 29
> chr3 [108, 117] - | 28
> chr2 [107, 116] - | 27
> chr2 [106, 115] - | 26
> ... ... ... ... ... ...
> e chr1 [5, 10] * | 5
> d chr2 [4, 10] * | 4
> c chr2 [3, 10] + | 3
> b chr2 [2, 10] + | 2
> a chr1 [1, 10] - | 1
> ---
> seqlengths:
> chr1 chr2 chr3
> 1000 2000 1500
>>
>
> >
> >2) The seqnames are always sorted in ascii order.
>
> No! but they _can_ be:
>
>> sort(longGR)
> GRanges with 30 ranges and 1 metadata column:
> seqnames ranges strand | score
> <Rle> <IRanges> <Rle> | <integer>
> f chr1 [6, 10] + | 6
> chr1 [1, 5] - | 101
> a chr1 [1, 10] - | 1
> chr1 [2, 6] - | 102
> chr1 [3, 7] - | 103
> ... ... ... ... ... ...
> j chr3 [ 10, 10] - | 10
> chr3 [ 10, 14] - | 110
> chr3 [108, 117] - | 28
> chr3 [109, 118] - | 29
> chr3 [110, 119] - | 30
> ---
> seqlengths:
> chr1 chr2 chr3
> 1000 2000 1500
Just a small point of clarification. The ordering of the seqnames
in lexicographical order here is just a consequence of the fact that
the seqlevels are already ordered in lexicographical order. If you
change the order of the seqlevels first, then sort() will produce a
different result:
seqlevels(longGR) <- seqlevels(longGR)[c(2,3,1)]
Then:
> seqlevels(longGR)
[1] "chr2" "chr3" "chr1"
> sort(longGR)
GRanges with 30 ranges and 1 metadata column:
seqnames ranges strand | score
<Rle> <IRanges> <Rle> | <integer>
b chr2 [2, 10] + | 2
c chr2 [3, 10] + | 3
chr2 [4, 8] - | 104
chr2 [5, 9] - | 105
chr2 [6, 10] - | 106
... ... ... ... ... ...
chr1 [ 3, 7] - | 103
chr1 [101, 110] - | 21
chr1 [102, 111] - | 22
chr1 [103, 112] - | 23
e chr1 [ 5, 10] * | 5
---
seqlengths:
chr2 chr3 chr1
2000 1500 1000
Cheers,
H.
>
>
> ~ Malcolm Cook
>
>
> >
> >3) After
> > df <- as.data.frame
> > m <- regexpr ("\\d+", df$seqnames, perl=TRUE)
> > df$Chromosome <- regmatches (df$seqnames, m)
> > df$Chromosome <- as.integer (as.character (df$Chromosome))
> > df <- df [order(df$Chromosome),]
> > only the order of the chromosomes is changed. The order of the ranges
> >(now df$start and df$end) is still the same.
> >
> >Are my assumptions true?
> >
> >Thanks Hermann
> >
> > [[alternative HTML version deleted]]
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at r-project.org
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list