[BioC] Odd behaviour with renameSeqlevels
Alex Gutteridge
alexg at ruggedtextile.com
Thu May 3 12:17:53 CEST 2012
Thanks Valerie.
On 02.05.2012 17:46, Valerie Obenchain wrote:
> I'm sorry Alex, I missed your point the first time. Yes, there was a
> bug in renameSeqlevels() wrt changing the chromosome names when the
> renaming vector was out of order with 'x'.
>
> Now fixed in 1.8.5 release /1.9.13 devel. Thanks for reporting this.
>
> Valerie
>
>
>
> On 05/02/2012 09:25 AM, Valerie Obenchain wrote:
>> Hi Alex,
>>
>> The ordering of the chromosome names displayed by seqlevels() comes
>> from the seqlinfo object in the txdb. The ordering in the txdb or the
>> txbygene before renaming is the same as after the renaming.
>>
>> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>> seqinfo(txdb)
>> seqlevels(txdb)
>>
>> txbygene = transcriptsBy(txdb,"gene")
>> seqinfo(txbygene)
>> seqlevels(txbygene)
>>
>> This is not necessarily the same order as the seqnames (i.e., order
>> of the ranges) in the txbygene object.
>> seqnames(txbygene)
>>
>> Renaming the seqlevels has not changed the order of your txbygene
>> ranges if that was the concern. No, the renaming vector does not need
>> to match the ordering of the original names.
>>
>> Here is another way to prepare your new seqlevel names,
>>
>> nms <- seqlevels(txbygene)[1:24]
>> vlu <- sub("chr", "", seqlevels(txbygene)[1:24], fixed=TRUE)
>> names(vlu) <- nms
>> renameSeqlevels(txbygene, vlu)
>>
>> Valerie
>>
>>
>> On 05/02/2012 04:43 AM, Alex Gutteridge wrote:
>>> Is this a bug in renameSeqlevels or expected behaviour? Note the
>>> weird ordering of chromosome names in txbygene (chrX between chr7 and
>>> chr8) which then results in misnaming when I try to use
>>> renameSeqlevels (everything after chr7 is off by one). The docs for
>>> renameSeqlevels aren't explicit in whether the renaming vector has to
>>> match the ordering of the original names, but I thought the point of
>>> making it named vector is that it doesn't?
>>>
>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>> Loading required package: GenomicFeatures
>>> Loading required package: BiocGenerics
>>>
>>> Attaching package: ‘BiocGenerics’
>>>
>>> The following object(s) are masked from ‘package:stats’:
>>>
>>> xtabs
>>>
>>> The following object(s) are masked from ‘package:base’:
>>>
>>> anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
>>> get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
>>> pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>>> rownames, sapply, setdiff, table, tapply, union, unique
>>>
>>> Loading required package: IRanges
>>> Loading required package: GenomicRanges
>>> Loading required package: AnnotationDbi
>>> Loading required package: Biobase
>>> Welcome to Bioconductor
>>>
>>> Vignettes contain introductory material; view with
>>> 'browseVignettes()'. To cite Bioconductor, see
>>> 'citation("Biobase")', and for packages 'citation("pkgname")'.
>>>
>>>> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>>>> txbygene = transcriptsBy(txdb,"gene")
>>>> tx =
>>>> renameSeqlevels(txbygene,c("chr1"="1","chr2"="2","chr3"="3","chr4"="4",
>>> +
>>> "chr5"="5","chr6"="6","chr7"="7","chr8"="8",
>>> +
>>> "chr9"="9","chr10"="10","chr11"="11","chr12"="12",
>>> +
>>> "chr13"="13","chr14"="14","chr15"="15","chr16"="16",
>>> +
>>> "chr17"="17","chr18"="18","chr19"="19","chr20"="20",
>>> +
>>> "chr21"="21","chr22"="22","chrX"="X"))
>>>> seqlevels(txbygene)
>>> [1] "chr1" "chr2" "chr3"
>>> [4] "chr4" "chr5" "chr6"
>>> [7] "chr7" "chrX" "chr8"
>>> [10] "chr9" "chr10" "chr11"
>>> [13] "chr12" "chr13" "chr14"
>>> [16] "chr15" "chr16" "chr17"
>>> [19] "chr18" "chr20" "chrY"
>>> [22] "chr19" "chr22" "chr21"
>>> [25] "chr6_ssto_hap7" "chr6_mcf_hap5"
>>> "chr6_cox_hap2"
>>> [28] "chr6_mann_hap4" "chr6_apd_hap1"
>>> "chr6_qbl_hap6"
>>> [31] "chr6_dbb_hap3" "chr17_ctg5_hap1"
>>> "chr4_ctg9_hap1"
>>> [34] "chr1_gl000192_random" "chrUn_gl000225"
>>> "chr4_gl000194_random"
>>> [37] "chr4_gl000193_random" "chr9_gl000200_random"
>>> "chrUn_gl000222"
>>> [40] "chrUn_gl000212" "chr7_gl000195_random"
>>> "chrUn_gl000223"
>>> [43] "chrUn_gl000224" "chrUn_gl000219"
>>> "chr17_gl000205_random"
>>> [46] "chrUn_gl000215" "chrUn_gl000216"
>>> "chrUn_gl000217"
>>> [49] "chr9_gl000199_random" "chrUn_gl000211"
>>> "chrUn_gl000213"
>>> [52] "chrUn_gl000220" "chrUn_gl000218"
>>> "chr19_gl000209_random"
>>> [55] "chrUn_gl000221" "chrUn_gl000214"
>>> "chrUn_gl000228"
>>> [58] "chrUn_gl000227" "chr1_gl000191_random"
>>> "chr19_gl000208_random"
>>> [61] "chr9_gl000198_random" "chr17_gl000204_random"
>>> "chrUn_gl000233"
>>> [64] "chrUn_gl000237" "chrUn_gl000230"
>>> "chrUn_gl000242"
>>> [67] "chrUn_gl000243" "chrUn_gl000241"
>>> "chrUn_gl000236"
>>> [70] "chrUn_gl000240" "chr17_gl000206_random"
>>> "chrUn_gl000232"
>>> [73] "chrUn_gl000234" "chr11_gl000202_random"
>>> "chrUn_gl000238"
>>> [76] "chrUn_gl000244" "chrUn_gl000248"
>>> "chr8_gl000196_random"
>>> [79] "chrUn_gl000249" "chrUn_gl000246"
>>> "chr17_gl000203_random"
>>> [82] "chr8_gl000197_random" "chrUn_gl000245"
>>> "chrUn_gl000247"
>>> [85] "chr9_gl000201_random" "chrUn_gl000235"
>>> "chrUn_gl000239"
>>> [88] "chr21_gl000210_random" "chrUn_gl000231"
>>> "chrUn_gl000229"
>>> [91] "chrM" "chrUn_gl000226"
>>> "chr18_gl000207_random"
>>>> seqlevels(tx)
>>> [1] "1" "2" "3"
>>> [4] "4" "5" "6"
>>> [7] "7" "8" "9"
>>> [10] "10" "11" "12"
>>> [13] "13" "14" "15"
>>> [16] "16" "17" "18"
>>> [19] "19" "20" "chrY"
>>> [22] "21" "22" "X"
>>> [25] "chr6_ssto_hap7" "chr6_mcf_hap5"
>>> "chr6_cox_hap2"
>>> [28] "chr6_mann_hap4" "chr6_apd_hap1"
>>> "chr6_qbl_hap6"
>>> [31] "chr6_dbb_hap3" "chr17_ctg5_hap1"
>>> "chr4_ctg9_hap1"
>>> [34] "chr1_gl000192_random" "chrUn_gl000225"
>>> "chr4_gl000194_random"
>>> [37] "chr4_gl000193_random" "chr9_gl000200_random"
>>> "chrUn_gl000222"
>>> [40] "chrUn_gl000212" "chr7_gl000195_random"
>>> "chrUn_gl000223"
>>> [43] "chrUn_gl000224" "chrUn_gl000219"
>>> "chr17_gl000205_random"
>>> [46] "chrUn_gl000215" "chrUn_gl000216"
>>> "chrUn_gl000217"
>>> [49] "chr9_gl000199_random" "chrUn_gl000211"
>>> "chrUn_gl000213"
>>> [52] "chrUn_gl000220" "chrUn_gl000218"
>>> "chr19_gl000209_random"
>>> [55] "chrUn_gl000221" "chrUn_gl000214"
>>> "chrUn_gl000228"
>>> [58] "chrUn_gl000227" "chr1_gl000191_random"
>>> "chr19_gl000208_random"
>>> [61] "chr9_gl000198_random" "chr17_gl000204_random"
>>> "chrUn_gl000233"
>>> [64] "chrUn_gl000237" "chrUn_gl000230"
>>> "chrUn_gl000242"
>>> [67] "chrUn_gl000243" "chrUn_gl000241"
>>> "chrUn_gl000236"
>>> [70] "chrUn_gl000240" "chr17_gl000206_random"
>>> "chrUn_gl000232"
>>> [73] "chrUn_gl000234" "chr11_gl000202_random"
>>> "chrUn_gl000238"
>>> [76] "chrUn_gl000244" "chrUn_gl000248"
>>> "chr8_gl000196_random"
>>> [79] "chrUn_gl000249" "chrUn_gl000246"
>>> "chr17_gl000203_random"
>>> [82] "chr8_gl000197_random" "chrUn_gl000245"
>>> "chrUn_gl000247"
>>> [85] "chr9_gl000201_random" "chrUn_gl000235"
>>> "chrUn_gl000239"
>>> [88] "chr21_gl000210_random" "chrUn_gl000231"
>>> "chrUn_gl000229"
>>> [91] "chrM" "chrUn_gl000226"
>>> "chr18_gl000207_random"
>>>> txbygene$'5327'
>>> GRanges with 6 ranges and 2 elementMetadata cols:
>>> seqnames ranges strand | tx_id tx_name
>>> <Rle> <IRanges> <Rle> | <integer> <character>
>>> [1] chr8 [42032236, 42050729] - | 31953 uc010lxf.1
>>> [2] chr8 [42032236, 42050729] - | 31954 uc010lxg.1
>>> [3] chr8 [42032236, 42065194] - | 31955 uc003xos.2
>>> [4] chr8 [42032236, 42065194] - | 31956 uc003xot.2
>>> [5] chr8 [42032236, 42065194] - | 31957 uc011lcm.1
>>> [6] chr8 [42032236, 42065194] - | 31958 uc011lcn.1
>>> ---
>>> seqlengths:
>>> chr1 chr2 ...
>>> chr18_gl000207_random
>>> 249250621 243199373 ...
>>> 4262
>>>> tx$'5327'
>>> GRanges with 6 ranges and 2 elementMetadata cols:
>>> seqnames ranges strand | tx_id tx_name
>>> <Rle> <IRanges> <Rle> | <integer> <character>
>>> [1] 9 [42032236, 42050729] - | 31953 uc010lxf.1
>>> [2] 9 [42032236, 42050729] - | 31954 uc010lxg.1
>>> [3] 9 [42032236, 42065194] - | 31955 uc003xos.2
>>> [4] 9 [42032236, 42065194] - | 31956 uc003xot.2
>>> [5] 9 [42032236, 42065194] - | 31957 uc011lcm.1
>>> [6] 9 [42032236, 42065194] - | 31958 uc011lcn.1
>>> ---
>>> seqlengths:
>>> 1 2 ...
>>> chr18_gl000207_random
>>> 249250621 243199373 ...
>>> 4262
>>>> txbygene$'1956'
>>> GRanges with 11 ranges and 2 elementMetadata cols:
>>> seqnames ranges strand | tx_id tx_name
>>> <Rle> <IRanges> <Rle> | <integer> <character>
>>> [1] chr7 [55086725, 55224644] + | 28336 uc003tqh.3
>>> [2] chr7 [55086725, 55236328] + | 28337 uc003tqi.3
>>> [3] chr7 [55086725, 55238738] + | 28338 uc003tqj.3
>>> [4] chr7 [55086725, 55270769] + | 28339 uc022adm.1
>>> [5] chr7 [55086725, 55270769] + | 28340 uc010kzg.2
>>> [6] chr7 [55086725, 55275031] + | 28341 uc003tqk.3
>>> [7] chr7 [55086725, 55275031] + | 28342 uc022adn.1
>>> [8] chr7 [55177540, 55275031] + | 28343 uc011kco.2
>>> [9] chr7 [55224226, 55238906] + | 28345 uc011kcq.1
>>> [10] chr7 [55224226, 55238906] + | 28346 uc011kcp.1
>>> [11] chr7 [55248979, 55259567] + | 28349 uc022ado.1
>>> ---
>>> seqlengths:
>>> chr1 chr2 ...
>>> chr18_gl000207_random
>>> 249250621 243199373 ...
>>> 4262
>>>> tx$'1956'
>>> GRanges with 11 ranges and 2 elementMetadata cols:
>>> seqnames ranges strand | tx_id tx_name
>>> <Rle> <IRanges> <Rle> | <integer> <character>
>>> [1] 7 [55086725, 55224644] + | 28336 uc003tqh.3
>>> [2] 7 [55086725, 55236328] + | 28337 uc003tqi.3
>>> [3] 7 [55086725, 55238738] + | 28338 uc003tqj.3
>>> [4] 7 [55086725, 55270769] + | 28339 uc022adm.1
>>> [5] 7 [55086725, 55270769] + | 28340 uc010kzg.2
>>> [6] 7 [55086725, 55275031] + | 28341 uc003tqk.3
>>> [7] 7 [55086725, 55275031] + | 28342 uc022adn.1
>>> [8] 7 [55177540, 55275031] + | 28343 uc011kco.2
>>> [9] 7 [55224226, 55238906] + | 28345 uc011kcq.1
>>> [10] 7 [55224226, 55238906] + | 28346 uc011kcp.1
>>> [11] 7 [55248979, 55259567] + | 28349 uc022ado.1
>>> ---
>>> seqlengths:
>>> 1 2 ...
>>> chr18_gl000207_random
>>> 249250621 243199373 ...
>>> 4262> sessionInfo()
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=C LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods
>>> base
>>>
>>> other attached packages:
>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.7.1
>>> [2] GenomicFeatures_1.8.1
>>> [3] AnnotationDbi_1.18.0
>>> [4] Biobase_2.16.0
>>> [5] GenomicRanges_1.8.3
>>> [6] IRanges_1.14.2
>>> [7] BiocGenerics_0.2.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] biomaRt_2.12.0 Biostrings_2.24.1 bitops_1.0-4.1
>>> BSgenome_1.24.0
>>> [5] DBI_0.2-5 RCurl_1.91-1 Rsamtools_1.8.3
>>> RSQLite_0.11.1
>>> [9] rtracklayer_1.16.1 stats4_2.15.0 tools_2.15.0
>>> XML_3.9-4
>>> [13] zlibbioc_1.2.0
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Alex Gutteridge
More information about the Bioconductor
mailing list