[BioC] Odd behaviour with renameSeqlevels

Valerie Obenchain vobencha at fhcrc.org
Wed May 2 18:25:49 CEST 2012


Hi Alex,

The ordering of the chromosome names displayed by seqlevels() comes from 
the seqlinfo object in the txdb. The ordering in the txdb or the 
txbygene before renaming is the same as after the renaming.

   txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
   seqinfo(txdb)
   seqlevels(txdb)

   txbygene = transcriptsBy(txdb,"gene")
   seqinfo(txbygene)
   seqlevels(txbygene)

This is not necessarily the same order as the seqnames (i.e., order of 
the ranges) in the txbygene object.
   seqnames(txbygene)

Renaming the seqlevels has not changed the order of your txbygene ranges 
if that was the concern. No, the renaming vector does not need to match 
the ordering of the original names.

Here is another way to prepare your new seqlevel names,

   nms <- seqlevels(txbygene)[1:24]
   vlu <- sub("chr", "", seqlevels(txbygene)[1:24], fixed=TRUE)
   names(vlu) <- nms
   renameSeqlevels(txbygene, vlu)

Valerie


On 05/02/2012 04:43 AM, Alex Gutteridge wrote:
> Is this a bug in renameSeqlevels or expected behaviour? Note the weird 
> ordering of chromosome names in txbygene (chrX between chr7 and chr8) 
> which then results in misnaming when I try to use renameSeqlevels 
> (everything after chr7 is off by one). The docs for renameSeqlevels 
> aren't explicit in whether the renaming vector has to match the 
> ordering of the original names, but I thought the point of making it 
> named vector is that it doesn't?
>
>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> Loading required package: GenomicFeatures
> Loading required package: BiocGenerics
>
> Attaching package: ‘BiocGenerics’
>
> The following object(s) are masked from ‘package:stats’:
>
>     xtabs
>
> The following object(s) are masked from ‘package:base’:
>
>     anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
>     get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
>     pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>     rownames, sapply, setdiff, table, tapply, union, unique
>
> Loading required package: IRanges
> Loading required package: GenomicRanges
> Loading required package: AnnotationDbi
> Loading required package: Biobase
> Welcome to Bioconductor
>
>     Vignettes contain introductory material; view with
>     'browseVignettes()'. To cite Bioconductor, see
>     'citation("Biobase")', and for packages 'citation("pkgname")'.
>
>> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>> txbygene = transcriptsBy(txdb,"gene")
>> tx = 
>> renameSeqlevels(txbygene,c("chr1"="1","chr2"="2","chr3"="3","chr4"="4",
> +                                 
> "chr5"="5","chr6"="6","chr7"="7","chr8"="8",
> +                                 
> "chr9"="9","chr10"="10","chr11"="11","chr12"="12",
> +                                 
> "chr13"="13","chr14"="14","chr15"="15","chr16"="16",
> +                                 
> "chr17"="17","chr18"="18","chr19"="19","chr20"="20",
> +                                 "chr21"="21","chr22"="22","chrX"="X"))
>> seqlevels(txbygene)
>  [1] "chr1"                  "chr2"                  "chr3"
>  [4] "chr4"                  "chr5"                  "chr6"
>  [7] "chr7"                  "chrX"                  "chr8"
> [10] "chr9"                  "chr10"                 "chr11"
> [13] "chr12"                 "chr13"                 "chr14"
> [16] "chr15"                 "chr16"                 "chr17"
> [19] "chr18"                 "chr20"                 "chrY"
> [22] "chr19"                 "chr22"                 "chr21"
> [25] "chr6_ssto_hap7"        "chr6_mcf_hap5"         "chr6_cox_hap2"
> [28] "chr6_mann_hap4"        "chr6_apd_hap1"         "chr6_qbl_hap6"
> [31] "chr6_dbb_hap3"         "chr17_ctg5_hap1"       "chr4_ctg9_hap1"
> [34] "chr1_gl000192_random"  "chrUn_gl000225"        
> "chr4_gl000194_random"
> [37] "chr4_gl000193_random"  "chr9_gl000200_random"  "chrUn_gl000222"
> [40] "chrUn_gl000212"        "chr7_gl000195_random"  "chrUn_gl000223"
> [43] "chrUn_gl000224"        "chrUn_gl000219"        
> "chr17_gl000205_random"
> [46] "chrUn_gl000215"        "chrUn_gl000216"        "chrUn_gl000217"
> [49] "chr9_gl000199_random"  "chrUn_gl000211"        "chrUn_gl000213"
> [52] "chrUn_gl000220"        "chrUn_gl000218"        
> "chr19_gl000209_random"
> [55] "chrUn_gl000221"        "chrUn_gl000214"        "chrUn_gl000228"
> [58] "chrUn_gl000227"        "chr1_gl000191_random"  
> "chr19_gl000208_random"
> [61] "chr9_gl000198_random"  "chr17_gl000204_random" "chrUn_gl000233"
> [64] "chrUn_gl000237"        "chrUn_gl000230"        "chrUn_gl000242"
> [67] "chrUn_gl000243"        "chrUn_gl000241"        "chrUn_gl000236"
> [70] "chrUn_gl000240"        "chr17_gl000206_random" "chrUn_gl000232"
> [73] "chrUn_gl000234"        "chr11_gl000202_random" "chrUn_gl000238"
> [76] "chrUn_gl000244"        "chrUn_gl000248"        
> "chr8_gl000196_random"
> [79] "chrUn_gl000249"        "chrUn_gl000246"        
> "chr17_gl000203_random"
> [82] "chr8_gl000197_random"  "chrUn_gl000245"        "chrUn_gl000247"
> [85] "chr9_gl000201_random"  "chrUn_gl000235"        "chrUn_gl000239"
> [88] "chr21_gl000210_random" "chrUn_gl000231"        "chrUn_gl000229"
> [91] "chrM"                  "chrUn_gl000226"        
> "chr18_gl000207_random"
>> seqlevels(tx)
>  [1] "1"                     "2"                     "3"
>  [4] "4"                     "5"                     "6"
>  [7] "7"                     "8"                     "9"
> [10] "10"                    "11"                    "12"
> [13] "13"                    "14"                    "15"
> [16] "16"                    "17"                    "18"
> [19] "19"                    "20"                    "chrY"
> [22] "21"                    "22"                    "X"
> [25] "chr6_ssto_hap7"        "chr6_mcf_hap5"         "chr6_cox_hap2"
> [28] "chr6_mann_hap4"        "chr6_apd_hap1"         "chr6_qbl_hap6"
> [31] "chr6_dbb_hap3"         "chr17_ctg5_hap1"       "chr4_ctg9_hap1"
> [34] "chr1_gl000192_random"  "chrUn_gl000225"        
> "chr4_gl000194_random"
> [37] "chr4_gl000193_random"  "chr9_gl000200_random"  "chrUn_gl000222"
> [40] "chrUn_gl000212"        "chr7_gl000195_random"  "chrUn_gl000223"
> [43] "chrUn_gl000224"        "chrUn_gl000219"        
> "chr17_gl000205_random"
> [46] "chrUn_gl000215"        "chrUn_gl000216"        "chrUn_gl000217"
> [49] "chr9_gl000199_random"  "chrUn_gl000211"        "chrUn_gl000213"
> [52] "chrUn_gl000220"        "chrUn_gl000218"        
> "chr19_gl000209_random"
> [55] "chrUn_gl000221"        "chrUn_gl000214"        "chrUn_gl000228"
> [58] "chrUn_gl000227"        "chr1_gl000191_random"  
> "chr19_gl000208_random"
> [61] "chr9_gl000198_random"  "chr17_gl000204_random" "chrUn_gl000233"
> [64] "chrUn_gl000237"        "chrUn_gl000230"        "chrUn_gl000242"
> [67] "chrUn_gl000243"        "chrUn_gl000241"        "chrUn_gl000236"
> [70] "chrUn_gl000240"        "chr17_gl000206_random" "chrUn_gl000232"
> [73] "chrUn_gl000234"        "chr11_gl000202_random" "chrUn_gl000238"
> [76] "chrUn_gl000244"        "chrUn_gl000248"        
> "chr8_gl000196_random"
> [79] "chrUn_gl000249"        "chrUn_gl000246"        
> "chr17_gl000203_random"
> [82] "chr8_gl000197_random"  "chrUn_gl000245"        "chrUn_gl000247"
> [85] "chr9_gl000201_random"  "chrUn_gl000235"        "chrUn_gl000239"
> [88] "chr21_gl000210_random" "chrUn_gl000231"        "chrUn_gl000229"
> [91] "chrM"                  "chrUn_gl000226"        
> "chr18_gl000207_random"
>> txbygene$'5327'
> GRanges with 6 ranges and 2 elementMetadata cols:
>       seqnames               ranges strand |     tx_id     tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
>   [1]     chr8 [42032236, 42050729]      - |     31953  uc010lxf.1
>   [2]     chr8 [42032236, 42050729]      - |     31954  uc010lxg.1
>   [3]     chr8 [42032236, 42065194]      - |     31955  uc003xos.2
>   [4]     chr8 [42032236, 42065194]      - |     31956  uc003xot.2
>   [5]     chr8 [42032236, 42065194]      - |     31957  uc011lcm.1
>   [6]     chr8 [42032236, 42065194]      - |     31958  uc011lcn.1
>   ---
>   seqlengths:
>                     chr1                  chr2 ... chr18_gl000207_random
>                249250621             243199373 ...                  4262
>> tx$'5327'
> GRanges with 6 ranges and 2 elementMetadata cols:
>       seqnames               ranges strand |     tx_id     tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
>   [1]        9 [42032236, 42050729]      - |     31953  uc010lxf.1
>   [2]        9 [42032236, 42050729]      - |     31954  uc010lxg.1
>   [3]        9 [42032236, 42065194]      - |     31955  uc003xos.2
>   [4]        9 [42032236, 42065194]      - |     31956  uc003xot.2
>   [5]        9 [42032236, 42065194]      - |     31957  uc011lcm.1
>   [6]        9 [42032236, 42065194]      - |     31958  uc011lcn.1
>   ---
>   seqlengths:
>                        1                     2 ... chr18_gl000207_random
>                249250621             243199373 ...                  4262
>> txbygene$'1956'
> GRanges with 11 ranges and 2 elementMetadata cols:
>        seqnames               ranges strand |     tx_id     tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
>    [1]     chr7 [55086725, 55224644]      + |     28336  uc003tqh.3
>    [2]     chr7 [55086725, 55236328]      + |     28337  uc003tqi.3
>    [3]     chr7 [55086725, 55238738]      + |     28338  uc003tqj.3
>    [4]     chr7 [55086725, 55270769]      + |     28339  uc022adm.1
>    [5]     chr7 [55086725, 55270769]      + |     28340  uc010kzg.2
>    [6]     chr7 [55086725, 55275031]      + |     28341  uc003tqk.3
>    [7]     chr7 [55086725, 55275031]      + |     28342  uc022adn.1
>    [8]     chr7 [55177540, 55275031]      + |     28343  uc011kco.2
>    [9]     chr7 [55224226, 55238906]      + |     28345  uc011kcq.1
>   [10]     chr7 [55224226, 55238906]      + |     28346  uc011kcp.1
>   [11]     chr7 [55248979, 55259567]      + |     28349  uc022ado.1
>   ---
>   seqlengths:
>                     chr1                  chr2 ... chr18_gl000207_random
>                249250621             243199373 ...                  4262
>> tx$'1956'
> GRanges with 11 ranges and 2 elementMetadata cols:
>        seqnames               ranges strand |     tx_id     tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
>    [1]        7 [55086725, 55224644]      + |     28336  uc003tqh.3
>    [2]        7 [55086725, 55236328]      + |     28337  uc003tqi.3
>    [3]        7 [55086725, 55238738]      + |     28338  uc003tqj.3
>    [4]        7 [55086725, 55270769]      + |     28339  uc022adm.1
>    [5]        7 [55086725, 55270769]      + |     28340  uc010kzg.2
>    [6]        7 [55086725, 55275031]      + |     28341  uc003tqk.3
>    [7]        7 [55086725, 55275031]      + |     28342  uc022adn.1
>    [8]        7 [55177540, 55275031]      + |     28343  uc011kco.2
>    [9]        7 [55224226, 55238906]      + |     28345  uc011kcq.1
>   [10]        7 [55224226, 55238906]      + |     28346  uc011kcp.1
>   [11]        7 [55248979, 55259567]      + |     28349  uc022ado.1
>   ---
>   seqlengths:
>                        1                     2 ... chr18_gl000207_random
>                249250621             243199373 ...                  
> 4262> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.7.1
> [2] GenomicFeatures_1.8.1
> [3] AnnotationDbi_1.18.0
> [4] Biobase_2.16.0
> [5] GenomicRanges_1.8.3
> [6] IRanges_1.14.2
> [7] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
>  [1] biomaRt_2.12.0     Biostrings_2.24.1  bitops_1.0-4.1     
> BSgenome_1.24.0
>  [5] DBI_0.2-5          RCurl_1.91-1       Rsamtools_1.8.3    
> RSQLite_0.11.1
>  [9] rtracklayer_1.16.1 stats4_2.15.0      tools_2.15.0       XML_3.9-4
> [13] zlibbioc_1.2.0
>



More information about the Bioconductor mailing list