[BioC] Odd behaviour with renameSeqlevels

Alex Gutteridge alexg at ruggedtextile.com
Thu May 3 12:17:53 CEST 2012


Thanks Valerie.

On 02.05.2012 17:46, Valerie Obenchain wrote:
> I'm sorry Alex, I missed your point the first time.  Yes, there was a
> bug in renameSeqlevels() wrt changing the chromosome names when the
> renaming vector was out of order with 'x'.
>
> Now fixed in 1.8.5 release /1.9.13 devel. Thanks for reporting this.
>
> Valerie
>
>
>
> On 05/02/2012 09:25 AM, Valerie Obenchain wrote:
>> Hi Alex,
>>
>> The ordering of the chromosome names displayed by seqlevels() comes 
>> from the seqlinfo object in the txdb. The ordering in the txdb or the 
>> txbygene before renaming is the same as after the renaming.
>>
>>   txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>>   seqinfo(txdb)
>>   seqlevels(txdb)
>>
>>   txbygene = transcriptsBy(txdb,"gene")
>>   seqinfo(txbygene)
>>   seqlevels(txbygene)
>>
>> This is not necessarily the same order as the seqnames (i.e., order 
>> of the ranges) in the txbygene object.
>>   seqnames(txbygene)
>>
>> Renaming the seqlevels has not changed the order of your txbygene 
>> ranges if that was the concern. No, the renaming vector does not need 
>> to match the ordering of the original names.
>>
>> Here is another way to prepare your new seqlevel names,
>>
>>   nms <- seqlevels(txbygene)[1:24]
>>   vlu <- sub("chr", "", seqlevels(txbygene)[1:24], fixed=TRUE)
>>   names(vlu) <- nms
>>   renameSeqlevels(txbygene, vlu)
>>
>> Valerie
>>
>>
>> On 05/02/2012 04:43 AM, Alex Gutteridge wrote:
>>> Is this a bug in renameSeqlevels or expected behaviour? Note the 
>>> weird ordering of chromosome names in txbygene (chrX between chr7 and 
>>> chr8) which then results in misnaming when I try to use 
>>> renameSeqlevels (everything after chr7 is off by one). The docs for 
>>> renameSeqlevels aren't explicit in whether the renaming vector has to 
>>> match the ordering of the original names, but I thought the point of 
>>> making it named vector is that it doesn't?
>>>
>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>> Loading required package: GenomicFeatures
>>> Loading required package: BiocGenerics
>>>
>>> Attaching package: ‘BiocGenerics’
>>>
>>> The following object(s) are masked from ‘package:stats’:
>>>
>>>     xtabs
>>>
>>> The following object(s) are masked from ‘package:base’:
>>>
>>>     anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
>>>     get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
>>>     pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>>>     rownames, sapply, setdiff, table, tapply, union, unique
>>>
>>> Loading required package: IRanges
>>> Loading required package: GenomicRanges
>>> Loading required package: AnnotationDbi
>>> Loading required package: Biobase
>>> Welcome to Bioconductor
>>>
>>>     Vignettes contain introductory material; view with
>>>     'browseVignettes()'. To cite Bioconductor, see
>>>     'citation("Biobase")', and for packages 'citation("pkgname")'.
>>>
>>>> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>>>> txbygene = transcriptsBy(txdb,"gene")
>>>> tx = 
>>>> renameSeqlevels(txbygene,c("chr1"="1","chr2"="2","chr3"="3","chr4"="4",
>>> +                                 
>>> "chr5"="5","chr6"="6","chr7"="7","chr8"="8",
>>> +                                 
>>> "chr9"="9","chr10"="10","chr11"="11","chr12"="12",
>>> +                                 
>>> "chr13"="13","chr14"="14","chr15"="15","chr16"="16",
>>> +                                 
>>> "chr17"="17","chr18"="18","chr19"="19","chr20"="20",
>>> +                                 
>>> "chr21"="21","chr22"="22","chrX"="X"))
>>>> seqlevels(txbygene)
>>>  [1] "chr1"                  "chr2"                  "chr3"
>>>  [4] "chr4"                  "chr5"                  "chr6"
>>>  [7] "chr7"                  "chrX"                  "chr8"
>>> [10] "chr9"                  "chr10"                 "chr11"
>>> [13] "chr12"                 "chr13"                 "chr14"
>>> [16] "chr15"                 "chr16"                 "chr17"
>>> [19] "chr18"                 "chr20"                 "chrY"
>>> [22] "chr19"                 "chr22"                 "chr21"
>>> [25] "chr6_ssto_hap7"        "chr6_mcf_hap5"         
>>> "chr6_cox_hap2"
>>> [28] "chr6_mann_hap4"        "chr6_apd_hap1"         
>>> "chr6_qbl_hap6"
>>> [31] "chr6_dbb_hap3"         "chr17_ctg5_hap1"       
>>> "chr4_ctg9_hap1"
>>> [34] "chr1_gl000192_random"  "chrUn_gl000225"        
>>> "chr4_gl000194_random"
>>> [37] "chr4_gl000193_random"  "chr9_gl000200_random"  
>>> "chrUn_gl000222"
>>> [40] "chrUn_gl000212"        "chr7_gl000195_random"  
>>> "chrUn_gl000223"
>>> [43] "chrUn_gl000224"        "chrUn_gl000219"        
>>> "chr17_gl000205_random"
>>> [46] "chrUn_gl000215"        "chrUn_gl000216"        
>>> "chrUn_gl000217"
>>> [49] "chr9_gl000199_random"  "chrUn_gl000211"        
>>> "chrUn_gl000213"
>>> [52] "chrUn_gl000220"        "chrUn_gl000218"        
>>> "chr19_gl000209_random"
>>> [55] "chrUn_gl000221"        "chrUn_gl000214"        
>>> "chrUn_gl000228"
>>> [58] "chrUn_gl000227"        "chr1_gl000191_random"  
>>> "chr19_gl000208_random"
>>> [61] "chr9_gl000198_random"  "chr17_gl000204_random" 
>>> "chrUn_gl000233"
>>> [64] "chrUn_gl000237"        "chrUn_gl000230"        
>>> "chrUn_gl000242"
>>> [67] "chrUn_gl000243"        "chrUn_gl000241"        
>>> "chrUn_gl000236"
>>> [70] "chrUn_gl000240"        "chr17_gl000206_random" 
>>> "chrUn_gl000232"
>>> [73] "chrUn_gl000234"        "chr11_gl000202_random" 
>>> "chrUn_gl000238"
>>> [76] "chrUn_gl000244"        "chrUn_gl000248"        
>>> "chr8_gl000196_random"
>>> [79] "chrUn_gl000249"        "chrUn_gl000246"        
>>> "chr17_gl000203_random"
>>> [82] "chr8_gl000197_random"  "chrUn_gl000245"        
>>> "chrUn_gl000247"
>>> [85] "chr9_gl000201_random"  "chrUn_gl000235"        
>>> "chrUn_gl000239"
>>> [88] "chr21_gl000210_random" "chrUn_gl000231"        
>>> "chrUn_gl000229"
>>> [91] "chrM"                  "chrUn_gl000226"        
>>> "chr18_gl000207_random"
>>>> seqlevels(tx)
>>>  [1] "1"                     "2"                     "3"
>>>  [4] "4"                     "5"                     "6"
>>>  [7] "7"                     "8"                     "9"
>>> [10] "10"                    "11"                    "12"
>>> [13] "13"                    "14"                    "15"
>>> [16] "16"                    "17"                    "18"
>>> [19] "19"                    "20"                    "chrY"
>>> [22] "21"                    "22"                    "X"
>>> [25] "chr6_ssto_hap7"        "chr6_mcf_hap5"         
>>> "chr6_cox_hap2"
>>> [28] "chr6_mann_hap4"        "chr6_apd_hap1"         
>>> "chr6_qbl_hap6"
>>> [31] "chr6_dbb_hap3"         "chr17_ctg5_hap1"       
>>> "chr4_ctg9_hap1"
>>> [34] "chr1_gl000192_random"  "chrUn_gl000225"        
>>> "chr4_gl000194_random"
>>> [37] "chr4_gl000193_random"  "chr9_gl000200_random"  
>>> "chrUn_gl000222"
>>> [40] "chrUn_gl000212"        "chr7_gl000195_random"  
>>> "chrUn_gl000223"
>>> [43] "chrUn_gl000224"        "chrUn_gl000219"        
>>> "chr17_gl000205_random"
>>> [46] "chrUn_gl000215"        "chrUn_gl000216"        
>>> "chrUn_gl000217"
>>> [49] "chr9_gl000199_random"  "chrUn_gl000211"        
>>> "chrUn_gl000213"
>>> [52] "chrUn_gl000220"        "chrUn_gl000218"        
>>> "chr19_gl000209_random"
>>> [55] "chrUn_gl000221"        "chrUn_gl000214"        
>>> "chrUn_gl000228"
>>> [58] "chrUn_gl000227"        "chr1_gl000191_random"  
>>> "chr19_gl000208_random"
>>> [61] "chr9_gl000198_random"  "chr17_gl000204_random" 
>>> "chrUn_gl000233"
>>> [64] "chrUn_gl000237"        "chrUn_gl000230"        
>>> "chrUn_gl000242"
>>> [67] "chrUn_gl000243"        "chrUn_gl000241"        
>>> "chrUn_gl000236"
>>> [70] "chrUn_gl000240"        "chr17_gl000206_random" 
>>> "chrUn_gl000232"
>>> [73] "chrUn_gl000234"        "chr11_gl000202_random" 
>>> "chrUn_gl000238"
>>> [76] "chrUn_gl000244"        "chrUn_gl000248"        
>>> "chr8_gl000196_random"
>>> [79] "chrUn_gl000249"        "chrUn_gl000246"        
>>> "chr17_gl000203_random"
>>> [82] "chr8_gl000197_random"  "chrUn_gl000245"        
>>> "chrUn_gl000247"
>>> [85] "chr9_gl000201_random"  "chrUn_gl000235"        
>>> "chrUn_gl000239"
>>> [88] "chr21_gl000210_random" "chrUn_gl000231"        
>>> "chrUn_gl000229"
>>> [91] "chrM"                  "chrUn_gl000226"        
>>> "chr18_gl000207_random"
>>>> txbygene$'5327'
>>> GRanges with 6 ranges and 2 elementMetadata cols:
>>>       seqnames               ranges strand |     tx_id     tx_name
>>> <Rle> <IRanges> <Rle> | <integer> <character>
>>>   [1]     chr8 [42032236, 42050729]      - |     31953  uc010lxf.1
>>>   [2]     chr8 [42032236, 42050729]      - |     31954  uc010lxg.1
>>>   [3]     chr8 [42032236, 42065194]      - |     31955  uc003xos.2
>>>   [4]     chr8 [42032236, 42065194]      - |     31956  uc003xot.2
>>>   [5]     chr8 [42032236, 42065194]      - |     31957  uc011lcm.1
>>>   [6]     chr8 [42032236, 42065194]      - |     31958  uc011lcn.1
>>>   ---
>>>   seqlengths:
>>>                     chr1                  chr2 ... 
>>> chr18_gl000207_random
>>>                249250621             243199373 ...                  
>>> 4262
>>>> tx$'5327'
>>> GRanges with 6 ranges and 2 elementMetadata cols:
>>>       seqnames               ranges strand |     tx_id     tx_name
>>> <Rle> <IRanges> <Rle> | <integer> <character>
>>>   [1]        9 [42032236, 42050729]      - |     31953  uc010lxf.1
>>>   [2]        9 [42032236, 42050729]      - |     31954  uc010lxg.1
>>>   [3]        9 [42032236, 42065194]      - |     31955  uc003xos.2
>>>   [4]        9 [42032236, 42065194]      - |     31956  uc003xot.2
>>>   [5]        9 [42032236, 42065194]      - |     31957  uc011lcm.1
>>>   [6]        9 [42032236, 42065194]      - |     31958  uc011lcn.1
>>>   ---
>>>   seqlengths:
>>>                        1                     2 ... 
>>> chr18_gl000207_random
>>>                249250621             243199373 ...                  
>>> 4262
>>>> txbygene$'1956'
>>> GRanges with 11 ranges and 2 elementMetadata cols:
>>>        seqnames               ranges strand |     tx_id     tx_name
>>> <Rle> <IRanges> <Rle> | <integer> <character>
>>>    [1]     chr7 [55086725, 55224644]      + |     28336  uc003tqh.3
>>>    [2]     chr7 [55086725, 55236328]      + |     28337  uc003tqi.3
>>>    [3]     chr7 [55086725, 55238738]      + |     28338  uc003tqj.3
>>>    [4]     chr7 [55086725, 55270769]      + |     28339  uc022adm.1
>>>    [5]     chr7 [55086725, 55270769]      + |     28340  uc010kzg.2
>>>    [6]     chr7 [55086725, 55275031]      + |     28341  uc003tqk.3
>>>    [7]     chr7 [55086725, 55275031]      + |     28342  uc022adn.1
>>>    [8]     chr7 [55177540, 55275031]      + |     28343  uc011kco.2
>>>    [9]     chr7 [55224226, 55238906]      + |     28345  uc011kcq.1
>>>   [10]     chr7 [55224226, 55238906]      + |     28346  uc011kcp.1
>>>   [11]     chr7 [55248979, 55259567]      + |     28349  uc022ado.1
>>>   ---
>>>   seqlengths:
>>>                     chr1                  chr2 ... 
>>> chr18_gl000207_random
>>>                249250621             243199373 ...                  
>>> 4262
>>>> tx$'1956'
>>> GRanges with 11 ranges and 2 elementMetadata cols:
>>>        seqnames               ranges strand |     tx_id     tx_name
>>> <Rle> <IRanges> <Rle> | <integer> <character>
>>>    [1]        7 [55086725, 55224644]      + |     28336  uc003tqh.3
>>>    [2]        7 [55086725, 55236328]      + |     28337  uc003tqi.3
>>>    [3]        7 [55086725, 55238738]      + |     28338  uc003tqj.3
>>>    [4]        7 [55086725, 55270769]      + |     28339  uc022adm.1
>>>    [5]        7 [55086725, 55270769]      + |     28340  uc010kzg.2
>>>    [6]        7 [55086725, 55275031]      + |     28341  uc003tqk.3
>>>    [7]        7 [55086725, 55275031]      + |     28342  uc022adn.1
>>>    [8]        7 [55177540, 55275031]      + |     28343  uc011kco.2
>>>    [9]        7 [55224226, 55238906]      + |     28345  uc011kcq.1
>>>   [10]        7 [55224226, 55238906]      + |     28346  uc011kcp.1
>>>   [11]        7 [55248979, 55259567]      + |     28349  uc022ado.1
>>>   ---
>>>   seqlengths:
>>>                        1                     2 ... 
>>> chr18_gl000207_random
>>>                249250621             243199373 ...                  
>>> 4262> sessionInfo()
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>  [7] LC_PAPER=C                 LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   
>>> base
>>>
>>> other attached packages:
>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.7.1
>>> [2] GenomicFeatures_1.8.1
>>> [3] AnnotationDbi_1.18.0
>>> [4] Biobase_2.16.0
>>> [5] GenomicRanges_1.8.3
>>> [6] IRanges_1.14.2
>>> [7] BiocGenerics_0.2.0
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] biomaRt_2.12.0     Biostrings_2.24.1  bitops_1.0-4.1     
>>> BSgenome_1.24.0
>>>  [5] DBI_0.2-5          RCurl_1.91-1       Rsamtools_1.8.3    
>>> RSQLite_0.11.1
>>>  [9] rtracklayer_1.16.1 stats4_2.15.0      tools_2.15.0       
>>> XML_3.9-4
>>> [13] zlibbioc_1.2.0
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Alex Gutteridge



More information about the Bioconductor mailing list