[BioC] Odd behaviour with renameSeqlevels
Valerie Obenchain
vobencha at fhcrc.org
Wed May 2 19:46:25 CEST 2012
I'm sorry Alex, I missed your point the first time. Yes, there was a
bug in renameSeqlevels() wrt changing the chromosome names when the
renaming vector was out of order with 'x'.
Now fixed in 1.8.5 release /1.9.13 devel. Thanks for reporting this.
Valerie
On 05/02/2012 09:25 AM, Valerie Obenchain wrote:
> Hi Alex,
>
> The ordering of the chromosome names displayed by seqlevels() comes
> from the seqlinfo object in the txdb. The ordering in the txdb or the
> txbygene before renaming is the same as after the renaming.
>
> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
> seqinfo(txdb)
> seqlevels(txdb)
>
> txbygene = transcriptsBy(txdb,"gene")
> seqinfo(txbygene)
> seqlevels(txbygene)
>
> This is not necessarily the same order as the seqnames (i.e., order of
> the ranges) in the txbygene object.
> seqnames(txbygene)
>
> Renaming the seqlevels has not changed the order of your txbygene
> ranges if that was the concern. No, the renaming vector does not need
> to match the ordering of the original names.
>
> Here is another way to prepare your new seqlevel names,
>
> nms <- seqlevels(txbygene)[1:24]
> vlu <- sub("chr", "", seqlevels(txbygene)[1:24], fixed=TRUE)
> names(vlu) <- nms
> renameSeqlevels(txbygene, vlu)
>
> Valerie
>
>
> On 05/02/2012 04:43 AM, Alex Gutteridge wrote:
>> Is this a bug in renameSeqlevels or expected behaviour? Note the
>> weird ordering of chromosome names in txbygene (chrX between chr7 and
>> chr8) which then results in misnaming when I try to use
>> renameSeqlevels (everything after chr7 is off by one). The docs for
>> renameSeqlevels aren't explicit in whether the renaming vector has to
>> match the ordering of the original names, but I thought the point of
>> making it named vector is that it doesn't?
>>
>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>> Loading required package: GenomicFeatures
>> Loading required package: BiocGenerics
>>
>> Attaching package: ‘BiocGenerics’
>>
>> The following object(s) are masked from ‘package:stats’:
>>
>> xtabs
>>
>> The following object(s) are masked from ‘package:base’:
>>
>> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
>> get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
>> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>> rownames, sapply, setdiff, table, tapply, union, unique
>>
>> Loading required package: IRanges
>> Loading required package: GenomicRanges
>> Loading required package: AnnotationDbi
>> Loading required package: Biobase
>> Welcome to Bioconductor
>>
>> Vignettes contain introductory material; view with
>> 'browseVignettes()'. To cite Bioconductor, see
>> 'citation("Biobase")', and for packages 'citation("pkgname")'.
>>
>>> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>>> txbygene = transcriptsBy(txdb,"gene")
>>> tx =
>>> renameSeqlevels(txbygene,c("chr1"="1","chr2"="2","chr3"="3","chr4"="4",
>> +
>> "chr5"="5","chr6"="6","chr7"="7","chr8"="8",
>> +
>> "chr9"="9","chr10"="10","chr11"="11","chr12"="12",
>> +
>> "chr13"="13","chr14"="14","chr15"="15","chr16"="16",
>> +
>> "chr17"="17","chr18"="18","chr19"="19","chr20"="20",
>> + "chr21"="21","chr22"="22","chrX"="X"))
>>> seqlevels(txbygene)
>> [1] "chr1" "chr2" "chr3"
>> [4] "chr4" "chr5" "chr6"
>> [7] "chr7" "chrX" "chr8"
>> [10] "chr9" "chr10" "chr11"
>> [13] "chr12" "chr13" "chr14"
>> [16] "chr15" "chr16" "chr17"
>> [19] "chr18" "chr20" "chrY"
>> [22] "chr19" "chr22" "chr21"
>> [25] "chr6_ssto_hap7" "chr6_mcf_hap5" "chr6_cox_hap2"
>> [28] "chr6_mann_hap4" "chr6_apd_hap1" "chr6_qbl_hap6"
>> [31] "chr6_dbb_hap3" "chr17_ctg5_hap1" "chr4_ctg9_hap1"
>> [34] "chr1_gl000192_random" "chrUn_gl000225"
>> "chr4_gl000194_random"
>> [37] "chr4_gl000193_random" "chr9_gl000200_random" "chrUn_gl000222"
>> [40] "chrUn_gl000212" "chr7_gl000195_random" "chrUn_gl000223"
>> [43] "chrUn_gl000224" "chrUn_gl000219"
>> "chr17_gl000205_random"
>> [46] "chrUn_gl000215" "chrUn_gl000216" "chrUn_gl000217"
>> [49] "chr9_gl000199_random" "chrUn_gl000211" "chrUn_gl000213"
>> [52] "chrUn_gl000220" "chrUn_gl000218"
>> "chr19_gl000209_random"
>> [55] "chrUn_gl000221" "chrUn_gl000214" "chrUn_gl000228"
>> [58] "chrUn_gl000227" "chr1_gl000191_random"
>> "chr19_gl000208_random"
>> [61] "chr9_gl000198_random" "chr17_gl000204_random" "chrUn_gl000233"
>> [64] "chrUn_gl000237" "chrUn_gl000230" "chrUn_gl000242"
>> [67] "chrUn_gl000243" "chrUn_gl000241" "chrUn_gl000236"
>> [70] "chrUn_gl000240" "chr17_gl000206_random" "chrUn_gl000232"
>> [73] "chrUn_gl000234" "chr11_gl000202_random" "chrUn_gl000238"
>> [76] "chrUn_gl000244" "chrUn_gl000248"
>> "chr8_gl000196_random"
>> [79] "chrUn_gl000249" "chrUn_gl000246"
>> "chr17_gl000203_random"
>> [82] "chr8_gl000197_random" "chrUn_gl000245" "chrUn_gl000247"
>> [85] "chr9_gl000201_random" "chrUn_gl000235" "chrUn_gl000239"
>> [88] "chr21_gl000210_random" "chrUn_gl000231" "chrUn_gl000229"
>> [91] "chrM" "chrUn_gl000226"
>> "chr18_gl000207_random"
>>> seqlevels(tx)
>> [1] "1" "2" "3"
>> [4] "4" "5" "6"
>> [7] "7" "8" "9"
>> [10] "10" "11" "12"
>> [13] "13" "14" "15"
>> [16] "16" "17" "18"
>> [19] "19" "20" "chrY"
>> [22] "21" "22" "X"
>> [25] "chr6_ssto_hap7" "chr6_mcf_hap5" "chr6_cox_hap2"
>> [28] "chr6_mann_hap4" "chr6_apd_hap1" "chr6_qbl_hap6"
>> [31] "chr6_dbb_hap3" "chr17_ctg5_hap1" "chr4_ctg9_hap1"
>> [34] "chr1_gl000192_random" "chrUn_gl000225"
>> "chr4_gl000194_random"
>> [37] "chr4_gl000193_random" "chr9_gl000200_random" "chrUn_gl000222"
>> [40] "chrUn_gl000212" "chr7_gl000195_random" "chrUn_gl000223"
>> [43] "chrUn_gl000224" "chrUn_gl000219"
>> "chr17_gl000205_random"
>> [46] "chrUn_gl000215" "chrUn_gl000216" "chrUn_gl000217"
>> [49] "chr9_gl000199_random" "chrUn_gl000211" "chrUn_gl000213"
>> [52] "chrUn_gl000220" "chrUn_gl000218"
>> "chr19_gl000209_random"
>> [55] "chrUn_gl000221" "chrUn_gl000214" "chrUn_gl000228"
>> [58] "chrUn_gl000227" "chr1_gl000191_random"
>> "chr19_gl000208_random"
>> [61] "chr9_gl000198_random" "chr17_gl000204_random" "chrUn_gl000233"
>> [64] "chrUn_gl000237" "chrUn_gl000230" "chrUn_gl000242"
>> [67] "chrUn_gl000243" "chrUn_gl000241" "chrUn_gl000236"
>> [70] "chrUn_gl000240" "chr17_gl000206_random" "chrUn_gl000232"
>> [73] "chrUn_gl000234" "chr11_gl000202_random" "chrUn_gl000238"
>> [76] "chrUn_gl000244" "chrUn_gl000248"
>> "chr8_gl000196_random"
>> [79] "chrUn_gl000249" "chrUn_gl000246"
>> "chr17_gl000203_random"
>> [82] "chr8_gl000197_random" "chrUn_gl000245" "chrUn_gl000247"
>> [85] "chr9_gl000201_random" "chrUn_gl000235" "chrUn_gl000239"
>> [88] "chr21_gl000210_random" "chrUn_gl000231" "chrUn_gl000229"
>> [91] "chrM" "chrUn_gl000226"
>> "chr18_gl000207_random"
>>> txbygene$'5327'
>> GRanges with 6 ranges and 2 elementMetadata cols:
>> seqnames ranges strand | tx_id tx_name
>> <Rle> <IRanges> <Rle> | <integer> <character>
>> [1] chr8 [42032236, 42050729] - | 31953 uc010lxf.1
>> [2] chr8 [42032236, 42050729] - | 31954 uc010lxg.1
>> [3] chr8 [42032236, 42065194] - | 31955 uc003xos.2
>> [4] chr8 [42032236, 42065194] - | 31956 uc003xot.2
>> [5] chr8 [42032236, 42065194] - | 31957 uc011lcm.1
>> [6] chr8 [42032236, 42065194] - | 31958 uc011lcn.1
>> ---
>> seqlengths:
>> chr1 chr2 ... chr18_gl000207_random
>> 249250621 243199373 ... 4262
>>> tx$'5327'
>> GRanges with 6 ranges and 2 elementMetadata cols:
>> seqnames ranges strand | tx_id tx_name
>> <Rle> <IRanges> <Rle> | <integer> <character>
>> [1] 9 [42032236, 42050729] - | 31953 uc010lxf.1
>> [2] 9 [42032236, 42050729] - | 31954 uc010lxg.1
>> [3] 9 [42032236, 42065194] - | 31955 uc003xos.2
>> [4] 9 [42032236, 42065194] - | 31956 uc003xot.2
>> [5] 9 [42032236, 42065194] - | 31957 uc011lcm.1
>> [6] 9 [42032236, 42065194] - | 31958 uc011lcn.1
>> ---
>> seqlengths:
>> 1 2 ... chr18_gl000207_random
>> 249250621 243199373 ... 4262
>>> txbygene$'1956'
>> GRanges with 11 ranges and 2 elementMetadata cols:
>> seqnames ranges strand | tx_id tx_name
>> <Rle> <IRanges> <Rle> | <integer> <character>
>> [1] chr7 [55086725, 55224644] + | 28336 uc003tqh.3
>> [2] chr7 [55086725, 55236328] + | 28337 uc003tqi.3
>> [3] chr7 [55086725, 55238738] + | 28338 uc003tqj.3
>> [4] chr7 [55086725, 55270769] + | 28339 uc022adm.1
>> [5] chr7 [55086725, 55270769] + | 28340 uc010kzg.2
>> [6] chr7 [55086725, 55275031] + | 28341 uc003tqk.3
>> [7] chr7 [55086725, 55275031] + | 28342 uc022adn.1
>> [8] chr7 [55177540, 55275031] + | 28343 uc011kco.2
>> [9] chr7 [55224226, 55238906] + | 28345 uc011kcq.1
>> [10] chr7 [55224226, 55238906] + | 28346 uc011kcp.1
>> [11] chr7 [55248979, 55259567] + | 28349 uc022ado.1
>> ---
>> seqlengths:
>> chr1 chr2 ... chr18_gl000207_random
>> 249250621 243199373 ... 4262
>>> tx$'1956'
>> GRanges with 11 ranges and 2 elementMetadata cols:
>> seqnames ranges strand | tx_id tx_name
>> <Rle> <IRanges> <Rle> | <integer> <character>
>> [1] 7 [55086725, 55224644] + | 28336 uc003tqh.3
>> [2] 7 [55086725, 55236328] + | 28337 uc003tqi.3
>> [3] 7 [55086725, 55238738] + | 28338 uc003tqj.3
>> [4] 7 [55086725, 55270769] + | 28339 uc022adm.1
>> [5] 7 [55086725, 55270769] + | 28340 uc010kzg.2
>> [6] 7 [55086725, 55275031] + | 28341 uc003tqk.3
>> [7] 7 [55086725, 55275031] + | 28342 uc022adn.1
>> [8] 7 [55177540, 55275031] + | 28343 uc011kco.2
>> [9] 7 [55224226, 55238906] + | 28345 uc011kcq.1
>> [10] 7 [55224226, 55238906] + | 28346 uc011kcp.1
>> [11] 7 [55248979, 55259567] + | 28349 uc022ado.1
>> ---
>> seqlengths:
>> 1 2 ... chr18_gl000207_random
>> 249250621 243199373 ...
>> 4262> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.7.1
>> [2] GenomicFeatures_1.8.1
>> [3] AnnotationDbi_1.18.0
>> [4] Biobase_2.16.0
>> [5] GenomicRanges_1.8.3
>> [6] IRanges_1.14.2
>> [7] BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.12.0 Biostrings_2.24.1 bitops_1.0-4.1
>> BSgenome_1.24.0
>> [5] DBI_0.2-5 RCurl_1.91-1 Rsamtools_1.8.3
>> RSQLite_0.11.1
>> [9] rtracklayer_1.16.1 stats4_2.15.0 tools_2.15.0 XML_3.9-4
>> [13] zlibbioc_1.2.0
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list