[BioC] Odd behaviour with renameSeqlevels
Valerie Obenchain
vobencha at fhcrc.org
Wed May 2 18:25:49 CEST 2012
Hi Alex,
The ordering of the chromosome names displayed by seqlevels() comes from
the seqlinfo object in the txdb. The ordering in the txdb or the
txbygene before renaming is the same as after the renaming.
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
seqinfo(txdb)
seqlevels(txdb)
txbygene = transcriptsBy(txdb,"gene")
seqinfo(txbygene)
seqlevels(txbygene)
This is not necessarily the same order as the seqnames (i.e., order of
the ranges) in the txbygene object.
seqnames(txbygene)
Renaming the seqlevels has not changed the order of your txbygene ranges
if that was the concern. No, the renaming vector does not need to match
the ordering of the original names.
Here is another way to prepare your new seqlevel names,
nms <- seqlevels(txbygene)[1:24]
vlu <- sub("chr", "", seqlevels(txbygene)[1:24], fixed=TRUE)
names(vlu) <- nms
renameSeqlevels(txbygene, vlu)
Valerie
On 05/02/2012 04:43 AM, Alex Gutteridge wrote:
> Is this a bug in renameSeqlevels or expected behaviour? Note the weird
> ordering of chromosome names in txbygene (chrX between chr7 and chr8)
> which then results in misnaming when I try to use renameSeqlevels
> (everything after chr7 is off by one). The docs for renameSeqlevels
> aren't explicit in whether the renaming vector has to match the
> ordering of the original names, but I thought the point of making it
> named vector is that it doesn't?
>
>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> Loading required package: GenomicFeatures
> Loading required package: BiocGenerics
>
> Attaching package: ‘BiocGenerics’
>
> The following object(s) are masked from ‘package:stats’:
>
> xtabs
>
> The following object(s) are masked from ‘package:base’:
>
> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
> get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
> rownames, sapply, setdiff, table, tapply, union, unique
>
> Loading required package: IRanges
> Loading required package: GenomicRanges
> Loading required package: AnnotationDbi
> Loading required package: Biobase
> Welcome to Bioconductor
>
> Vignettes contain introductory material; view with
> 'browseVignettes()'. To cite Bioconductor, see
> 'citation("Biobase")', and for packages 'citation("pkgname")'.
>
>> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>> txbygene = transcriptsBy(txdb,"gene")
>> tx =
>> renameSeqlevels(txbygene,c("chr1"="1","chr2"="2","chr3"="3","chr4"="4",
> +
> "chr5"="5","chr6"="6","chr7"="7","chr8"="8",
> +
> "chr9"="9","chr10"="10","chr11"="11","chr12"="12",
> +
> "chr13"="13","chr14"="14","chr15"="15","chr16"="16",
> +
> "chr17"="17","chr18"="18","chr19"="19","chr20"="20",
> + "chr21"="21","chr22"="22","chrX"="X"))
>> seqlevels(txbygene)
> [1] "chr1" "chr2" "chr3"
> [4] "chr4" "chr5" "chr6"
> [7] "chr7" "chrX" "chr8"
> [10] "chr9" "chr10" "chr11"
> [13] "chr12" "chr13" "chr14"
> [16] "chr15" "chr16" "chr17"
> [19] "chr18" "chr20" "chrY"
> [22] "chr19" "chr22" "chr21"
> [25] "chr6_ssto_hap7" "chr6_mcf_hap5" "chr6_cox_hap2"
> [28] "chr6_mann_hap4" "chr6_apd_hap1" "chr6_qbl_hap6"
> [31] "chr6_dbb_hap3" "chr17_ctg5_hap1" "chr4_ctg9_hap1"
> [34] "chr1_gl000192_random" "chrUn_gl000225"
> "chr4_gl000194_random"
> [37] "chr4_gl000193_random" "chr9_gl000200_random" "chrUn_gl000222"
> [40] "chrUn_gl000212" "chr7_gl000195_random" "chrUn_gl000223"
> [43] "chrUn_gl000224" "chrUn_gl000219"
> "chr17_gl000205_random"
> [46] "chrUn_gl000215" "chrUn_gl000216" "chrUn_gl000217"
> [49] "chr9_gl000199_random" "chrUn_gl000211" "chrUn_gl000213"
> [52] "chrUn_gl000220" "chrUn_gl000218"
> "chr19_gl000209_random"
> [55] "chrUn_gl000221" "chrUn_gl000214" "chrUn_gl000228"
> [58] "chrUn_gl000227" "chr1_gl000191_random"
> "chr19_gl000208_random"
> [61] "chr9_gl000198_random" "chr17_gl000204_random" "chrUn_gl000233"
> [64] "chrUn_gl000237" "chrUn_gl000230" "chrUn_gl000242"
> [67] "chrUn_gl000243" "chrUn_gl000241" "chrUn_gl000236"
> [70] "chrUn_gl000240" "chr17_gl000206_random" "chrUn_gl000232"
> [73] "chrUn_gl000234" "chr11_gl000202_random" "chrUn_gl000238"
> [76] "chrUn_gl000244" "chrUn_gl000248"
> "chr8_gl000196_random"
> [79] "chrUn_gl000249" "chrUn_gl000246"
> "chr17_gl000203_random"
> [82] "chr8_gl000197_random" "chrUn_gl000245" "chrUn_gl000247"
> [85] "chr9_gl000201_random" "chrUn_gl000235" "chrUn_gl000239"
> [88] "chr21_gl000210_random" "chrUn_gl000231" "chrUn_gl000229"
> [91] "chrM" "chrUn_gl000226"
> "chr18_gl000207_random"
>> seqlevels(tx)
> [1] "1" "2" "3"
> [4] "4" "5" "6"
> [7] "7" "8" "9"
> [10] "10" "11" "12"
> [13] "13" "14" "15"
> [16] "16" "17" "18"
> [19] "19" "20" "chrY"
> [22] "21" "22" "X"
> [25] "chr6_ssto_hap7" "chr6_mcf_hap5" "chr6_cox_hap2"
> [28] "chr6_mann_hap4" "chr6_apd_hap1" "chr6_qbl_hap6"
> [31] "chr6_dbb_hap3" "chr17_ctg5_hap1" "chr4_ctg9_hap1"
> [34] "chr1_gl000192_random" "chrUn_gl000225"
> "chr4_gl000194_random"
> [37] "chr4_gl000193_random" "chr9_gl000200_random" "chrUn_gl000222"
> [40] "chrUn_gl000212" "chr7_gl000195_random" "chrUn_gl000223"
> [43] "chrUn_gl000224" "chrUn_gl000219"
> "chr17_gl000205_random"
> [46] "chrUn_gl000215" "chrUn_gl000216" "chrUn_gl000217"
> [49] "chr9_gl000199_random" "chrUn_gl000211" "chrUn_gl000213"
> [52] "chrUn_gl000220" "chrUn_gl000218"
> "chr19_gl000209_random"
> [55] "chrUn_gl000221" "chrUn_gl000214" "chrUn_gl000228"
> [58] "chrUn_gl000227" "chr1_gl000191_random"
> "chr19_gl000208_random"
> [61] "chr9_gl000198_random" "chr17_gl000204_random" "chrUn_gl000233"
> [64] "chrUn_gl000237" "chrUn_gl000230" "chrUn_gl000242"
> [67] "chrUn_gl000243" "chrUn_gl000241" "chrUn_gl000236"
> [70] "chrUn_gl000240" "chr17_gl000206_random" "chrUn_gl000232"
> [73] "chrUn_gl000234" "chr11_gl000202_random" "chrUn_gl000238"
> [76] "chrUn_gl000244" "chrUn_gl000248"
> "chr8_gl000196_random"
> [79] "chrUn_gl000249" "chrUn_gl000246"
> "chr17_gl000203_random"
> [82] "chr8_gl000197_random" "chrUn_gl000245" "chrUn_gl000247"
> [85] "chr9_gl000201_random" "chrUn_gl000235" "chrUn_gl000239"
> [88] "chr21_gl000210_random" "chrUn_gl000231" "chrUn_gl000229"
> [91] "chrM" "chrUn_gl000226"
> "chr18_gl000207_random"
>> txbygene$'5327'
> GRanges with 6 ranges and 2 elementMetadata cols:
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> [1] chr8 [42032236, 42050729] - | 31953 uc010lxf.1
> [2] chr8 [42032236, 42050729] - | 31954 uc010lxg.1
> [3] chr8 [42032236, 42065194] - | 31955 uc003xos.2
> [4] chr8 [42032236, 42065194] - | 31956 uc003xot.2
> [5] chr8 [42032236, 42065194] - | 31957 uc011lcm.1
> [6] chr8 [42032236, 42065194] - | 31958 uc011lcn.1
> ---
> seqlengths:
> chr1 chr2 ... chr18_gl000207_random
> 249250621 243199373 ... 4262
>> tx$'5327'
> GRanges with 6 ranges and 2 elementMetadata cols:
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> [1] 9 [42032236, 42050729] - | 31953 uc010lxf.1
> [2] 9 [42032236, 42050729] - | 31954 uc010lxg.1
> [3] 9 [42032236, 42065194] - | 31955 uc003xos.2
> [4] 9 [42032236, 42065194] - | 31956 uc003xot.2
> [5] 9 [42032236, 42065194] - | 31957 uc011lcm.1
> [6] 9 [42032236, 42065194] - | 31958 uc011lcn.1
> ---
> seqlengths:
> 1 2 ... chr18_gl000207_random
> 249250621 243199373 ... 4262
>> txbygene$'1956'
> GRanges with 11 ranges and 2 elementMetadata cols:
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> [1] chr7 [55086725, 55224644] + | 28336 uc003tqh.3
> [2] chr7 [55086725, 55236328] + | 28337 uc003tqi.3
> [3] chr7 [55086725, 55238738] + | 28338 uc003tqj.3
> [4] chr7 [55086725, 55270769] + | 28339 uc022adm.1
> [5] chr7 [55086725, 55270769] + | 28340 uc010kzg.2
> [6] chr7 [55086725, 55275031] + | 28341 uc003tqk.3
> [7] chr7 [55086725, 55275031] + | 28342 uc022adn.1
> [8] chr7 [55177540, 55275031] + | 28343 uc011kco.2
> [9] chr7 [55224226, 55238906] + | 28345 uc011kcq.1
> [10] chr7 [55224226, 55238906] + | 28346 uc011kcp.1
> [11] chr7 [55248979, 55259567] + | 28349 uc022ado.1
> ---
> seqlengths:
> chr1 chr2 ... chr18_gl000207_random
> 249250621 243199373 ... 4262
>> tx$'1956'
> GRanges with 11 ranges and 2 elementMetadata cols:
> seqnames ranges strand | tx_id tx_name
> <Rle> <IRanges> <Rle> | <integer> <character>
> [1] 7 [55086725, 55224644] + | 28336 uc003tqh.3
> [2] 7 [55086725, 55236328] + | 28337 uc003tqi.3
> [3] 7 [55086725, 55238738] + | 28338 uc003tqj.3
> [4] 7 [55086725, 55270769] + | 28339 uc022adm.1
> [5] 7 [55086725, 55270769] + | 28340 uc010kzg.2
> [6] 7 [55086725, 55275031] + | 28341 uc003tqk.3
> [7] 7 [55086725, 55275031] + | 28342 uc022adn.1
> [8] 7 [55177540, 55275031] + | 28343 uc011kco.2
> [9] 7 [55224226, 55238906] + | 28345 uc011kcq.1
> [10] 7 [55224226, 55238906] + | 28346 uc011kcp.1
> [11] 7 [55248979, 55259567] + | 28349 uc022ado.1
> ---
> seqlengths:
> 1 2 ... chr18_gl000207_random
> 249250621 243199373 ...
> 4262> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.7.1
> [2] GenomicFeatures_1.8.1
> [3] AnnotationDbi_1.18.0
> [4] Biobase_2.16.0
> [5] GenomicRanges_1.8.3
> [6] IRanges_1.14.2
> [7] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.12.0 Biostrings_2.24.1 bitops_1.0-4.1
> BSgenome_1.24.0
> [5] DBI_0.2-5 RCurl_1.91-1 Rsamtools_1.8.3
> RSQLite_0.11.1
> [9] rtracklayer_1.16.1 stats4_2.15.0 tools_2.15.0 XML_3.9-4
> [13] zlibbioc_1.2.0
>
More information about the Bioconductor
mailing list