[BioC] Odd behaviour with renameSeqlevels
Alex Gutteridge
alexg at ruggedtextile.com
Wed May 2 13:43:41 CEST 2012
Is this a bug in renameSeqlevels or expected behaviour? Note the weird
ordering of chromosome names in txbygene (chrX between chr7 and chr8)
which then results in misnaming when I try to use renameSeqlevels
(everything after chr7 is off by one). The docs for renameSeqlevels
aren't explicit in whether the renaming vector has to match the ordering
of the original names, but I thought the point of making it named vector
is that it doesn't?
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
Loading required package: GenomicFeatures
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following object(s) are masked from ‘package:stats’:
xtabs
The following object(s) are masked from ‘package:base’:
anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
rownames, sapply, setdiff, table, tapply, union, unique
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
> txbygene = transcriptsBy(txdb,"gene")
> tx =
> renameSeqlevels(txbygene,c("chr1"="1","chr2"="2","chr3"="3","chr4"="4",
+
"chr5"="5","chr6"="6","chr7"="7","chr8"="8",
+
"chr9"="9","chr10"="10","chr11"="11","chr12"="12",
+
"chr13"="13","chr14"="14","chr15"="15","chr16"="16",
+
"chr17"="17","chr18"="18","chr19"="19","chr20"="20",
+
"chr21"="21","chr22"="22","chrX"="X"))
> seqlevels(txbygene)
[1] "chr1" "chr2" "chr3"
[4] "chr4" "chr5" "chr6"
[7] "chr7" "chrX" "chr8"
[10] "chr9" "chr10" "chr11"
[13] "chr12" "chr13" "chr14"
[16] "chr15" "chr16" "chr17"
[19] "chr18" "chr20" "chrY"
[22] "chr19" "chr22" "chr21"
[25] "chr6_ssto_hap7" "chr6_mcf_hap5" "chr6_cox_hap2"
[28] "chr6_mann_hap4" "chr6_apd_hap1" "chr6_qbl_hap6"
[31] "chr6_dbb_hap3" "chr17_ctg5_hap1" "chr4_ctg9_hap1"
[34] "chr1_gl000192_random" "chrUn_gl000225"
"chr4_gl000194_random"
[37] "chr4_gl000193_random" "chr9_gl000200_random" "chrUn_gl000222"
[40] "chrUn_gl000212" "chr7_gl000195_random" "chrUn_gl000223"
[43] "chrUn_gl000224" "chrUn_gl000219"
"chr17_gl000205_random"
[46] "chrUn_gl000215" "chrUn_gl000216" "chrUn_gl000217"
[49] "chr9_gl000199_random" "chrUn_gl000211" "chrUn_gl000213"
[52] "chrUn_gl000220" "chrUn_gl000218"
"chr19_gl000209_random"
[55] "chrUn_gl000221" "chrUn_gl000214" "chrUn_gl000228"
[58] "chrUn_gl000227" "chr1_gl000191_random"
"chr19_gl000208_random"
[61] "chr9_gl000198_random" "chr17_gl000204_random" "chrUn_gl000233"
[64] "chrUn_gl000237" "chrUn_gl000230" "chrUn_gl000242"
[67] "chrUn_gl000243" "chrUn_gl000241" "chrUn_gl000236"
[70] "chrUn_gl000240" "chr17_gl000206_random" "chrUn_gl000232"
[73] "chrUn_gl000234" "chr11_gl000202_random" "chrUn_gl000238"
[76] "chrUn_gl000244" "chrUn_gl000248"
"chr8_gl000196_random"
[79] "chrUn_gl000249" "chrUn_gl000246"
"chr17_gl000203_random"
[82] "chr8_gl000197_random" "chrUn_gl000245" "chrUn_gl000247"
[85] "chr9_gl000201_random" "chrUn_gl000235" "chrUn_gl000239"
[88] "chr21_gl000210_random" "chrUn_gl000231" "chrUn_gl000229"
[91] "chrM" "chrUn_gl000226"
"chr18_gl000207_random"
> seqlevels(tx)
[1] "1" "2" "3"
[4] "4" "5" "6"
[7] "7" "8" "9"
[10] "10" "11" "12"
[13] "13" "14" "15"
[16] "16" "17" "18"
[19] "19" "20" "chrY"
[22] "21" "22" "X"
[25] "chr6_ssto_hap7" "chr6_mcf_hap5" "chr6_cox_hap2"
[28] "chr6_mann_hap4" "chr6_apd_hap1" "chr6_qbl_hap6"
[31] "chr6_dbb_hap3" "chr17_ctg5_hap1" "chr4_ctg9_hap1"
[34] "chr1_gl000192_random" "chrUn_gl000225"
"chr4_gl000194_random"
[37] "chr4_gl000193_random" "chr9_gl000200_random" "chrUn_gl000222"
[40] "chrUn_gl000212" "chr7_gl000195_random" "chrUn_gl000223"
[43] "chrUn_gl000224" "chrUn_gl000219"
"chr17_gl000205_random"
[46] "chrUn_gl000215" "chrUn_gl000216" "chrUn_gl000217"
[49] "chr9_gl000199_random" "chrUn_gl000211" "chrUn_gl000213"
[52] "chrUn_gl000220" "chrUn_gl000218"
"chr19_gl000209_random"
[55] "chrUn_gl000221" "chrUn_gl000214" "chrUn_gl000228"
[58] "chrUn_gl000227" "chr1_gl000191_random"
"chr19_gl000208_random"
[61] "chr9_gl000198_random" "chr17_gl000204_random" "chrUn_gl000233"
[64] "chrUn_gl000237" "chrUn_gl000230" "chrUn_gl000242"
[67] "chrUn_gl000243" "chrUn_gl000241" "chrUn_gl000236"
[70] "chrUn_gl000240" "chr17_gl000206_random" "chrUn_gl000232"
[73] "chrUn_gl000234" "chr11_gl000202_random" "chrUn_gl000238"
[76] "chrUn_gl000244" "chrUn_gl000248"
"chr8_gl000196_random"
[79] "chrUn_gl000249" "chrUn_gl000246"
"chr17_gl000203_random"
[82] "chr8_gl000197_random" "chrUn_gl000245" "chrUn_gl000247"
[85] "chr9_gl000201_random" "chrUn_gl000235" "chrUn_gl000239"
[88] "chr21_gl000210_random" "chrUn_gl000231" "chrUn_gl000229"
[91] "chrM" "chrUn_gl000226"
"chr18_gl000207_random"
> txbygene$'5327'
GRanges with 6 ranges and 2 elementMetadata cols:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr8 [42032236, 42050729] - | 31953 uc010lxf.1
[2] chr8 [42032236, 42050729] - | 31954 uc010lxg.1
[3] chr8 [42032236, 42065194] - | 31955 uc003xos.2
[4] chr8 [42032236, 42065194] - | 31956 uc003xot.2
[5] chr8 [42032236, 42065194] - | 31957 uc011lcm.1
[6] chr8 [42032236, 42065194] - | 31958 uc011lcn.1
---
seqlengths:
chr1 chr2 ...
chr18_gl000207_random
249250621 243199373 ...
4262
> tx$'5327'
GRanges with 6 ranges and 2 elementMetadata cols:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] 9 [42032236, 42050729] - | 31953 uc010lxf.1
[2] 9 [42032236, 42050729] - | 31954 uc010lxg.1
[3] 9 [42032236, 42065194] - | 31955 uc003xos.2
[4] 9 [42032236, 42065194] - | 31956 uc003xot.2
[5] 9 [42032236, 42065194] - | 31957 uc011lcm.1
[6] 9 [42032236, 42065194] - | 31958 uc011lcn.1
---
seqlengths:
1 2 ...
chr18_gl000207_random
249250621 243199373 ...
4262
> txbygene$'1956'
GRanges with 11 ranges and 2 elementMetadata cols:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr7 [55086725, 55224644] + | 28336 uc003tqh.3
[2] chr7 [55086725, 55236328] + | 28337 uc003tqi.3
[3] chr7 [55086725, 55238738] + | 28338 uc003tqj.3
[4] chr7 [55086725, 55270769] + | 28339 uc022adm.1
[5] chr7 [55086725, 55270769] + | 28340 uc010kzg.2
[6] chr7 [55086725, 55275031] + | 28341 uc003tqk.3
[7] chr7 [55086725, 55275031] + | 28342 uc022adn.1
[8] chr7 [55177540, 55275031] + | 28343 uc011kco.2
[9] chr7 [55224226, 55238906] + | 28345 uc011kcq.1
[10] chr7 [55224226, 55238906] + | 28346 uc011kcp.1
[11] chr7 [55248979, 55259567] + | 28349 uc022ado.1
---
seqlengths:
chr1 chr2 ...
chr18_gl000207_random
249250621 243199373 ...
4262
> tx$'1956'
GRanges with 11 ranges and 2 elementMetadata cols:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] 7 [55086725, 55224644] + | 28336 uc003tqh.3
[2] 7 [55086725, 55236328] + | 28337 uc003tqi.3
[3] 7 [55086725, 55238738] + | 28338 uc003tqj.3
[4] 7 [55086725, 55270769] + | 28339 uc022adm.1
[5] 7 [55086725, 55270769] + | 28340 uc010kzg.2
[6] 7 [55086725, 55275031] + | 28341 uc003tqk.3
[7] 7 [55086725, 55275031] + | 28342 uc022adn.1
[8] 7 [55177540, 55275031] + | 28343 uc011kco.2
[9] 7 [55224226, 55238906] + | 28345 uc011kcq.1
[10] 7 [55224226, 55238906] + | 28346 uc011kcp.1
[11] 7 [55248979, 55259567] + | 28349 uc022ado.1
---
seqlengths:
1 2 ...
chr18_gl000207_random
249250621 243199373 ...
4262> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.7.1
[2] GenomicFeatures_1.8.1
[3] AnnotationDbi_1.18.0
[4] Biobase_2.16.0
[5] GenomicRanges_1.8.3
[6] IRanges_1.14.2
[7] BiocGenerics_0.2.0
loaded via a namespace (and not attached):
[1] biomaRt_2.12.0 Biostrings_2.24.1 bitops_1.0-4.1
BSgenome_1.24.0
[5] DBI_0.2-5 RCurl_1.91-1 Rsamtools_1.8.3
RSQLite_0.11.1
[9] rtracklayer_1.16.1 stats4_2.15.0 tools_2.15.0 XML_3.9-4
[13] zlibbioc_1.2.0
--
Alex Gutteridge
More information about the Bioconductor
mailing list