[BioC] BUG in Genomic(Features|Ranges): names(unlist(transcriptsBy(txdb, 'gene'))) is UNRELIABLE!!!
Cook, Malcolm
MEC at stowers.org
Sat Sep 1 07:07:25 CEST 2012
Careful fellow travelers,
I find that unlisting the GenomicRanges returned from a call to `transcriptsBy` returns a list with names that are gene names... only they are incorrect!
Look:
> txdb<-makeTranscriptDbFromBiomart(biomart="ensembl", dataset="dmelanogaster_gene_ensembl")
...
> transcriptsBy(txdb,'gene')[2]
GRangesList of length 1:
$FBgn0000008
GRanges with 3 ranges and 2 elementMetadata cols:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] 2R [18024494, 18060339] + | 8616 FBtr0100521
[2] 2R [18024496, 18060346] + | 8615 FBtr0071763
[3] 2R [18024938, 18060346] + | 8617 FBtr0071764
...
> unlist(transcriptsBy(txdb,'gene')[2])
GRanges with 3 ranges and 2 elementMetadata cols:
seqnames ranges strand | tx_id tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
FBgn0000008 2R [18024494, 18060339] + | 8616 FBtr0100521
FBgn00000081 2R [18024496, 18060346] + | 8615 FBtr0071763
FBgn00000082 2R [18024938, 18060346] + | 8617 FBtr0071764
...
Arguably, those names on the the GRanges should either all be the same, namely FBgn0000008, or they should not be returned.
This 'bug' has been around for a some time. I meant to report it, and just tripped over it again.
Can fix?
Thanks!
Malcolm
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] tools splines parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] igraph_0.6-2 log4r_0.1-4 vwr_0.1 RecordLinkage_0.4-1 ffbase_0.5 ff_2.2-7 bit_1.1-8 evd_2.2-7 ipred_0.8-13 prodlim_1.3.1 KernSmooth_2.23-8 nnet_7.3-4 survival_2.36-14 mlbench_2.1-1 MASS_7.3-20 ada_2.0-3 rpart_3.1-54 e1071_1.6 class_7.3-4 XLConnect_0.2-0 XLConnectJars_0.2-0 rJava_0.9-3 latticeExtra_0.6-19 RColorBrewer_1.0-5 lattice_0.20-6 doMC_1.2.5 multicore_0.1-7
[28] SRAdb_1.10.0 RCurl_1.91-1 bitops_1.0-4.1 graph_1.34.0 BSgenome_1.24.0 rtracklayer_1.16.3 Rsamtools_1.8.6 Biostrings_2.24.1 GenomicFeatures_1.8.2 AnnotationDbi_1.19.31 GenomicRanges_1.8.12 R.utils_1.16.0 R.oo_1.9.8 R.methodsS3_1.4.2 IRanges_1.14.4 Biobase_2.17.7 BiocGenerics_0.3.1 data.table_1.8.2 compare_0.2-3 svUnit_0.7-10 doParallel_1.0.1 iterators_1.0.6 foreach_1.4.0 ggplot2_0.9.1 sqldf_0.4-6.4 RSQLite.extfuns_0.0.1 RSQLite_0.11.1
[55] chron_2.3-42 gsubfn_0.6-4 proto_0.3-9.2 DBI_0.2-5 functional_0.1 reshape_0.8.4 plyr_1.7.1 stringr_0.6.1 gtools_2.7.0
loaded via a namespace (and not attached):
[1] biomaRt_2.12.0 codetools_0.2-8 colorspace_1.1-1 compiler_2.15.0 dichromat_1.2-4 digest_0.5.2 GEOquery_2.23.5 grid_2.15.0 labeling_0.1 memoise_0.1 munsell_0.3 reshape2_1.2.1 scales_0.2.1 stats4_2.15.0 tcltk_2.15.0 XML_3.9-4 zlibbioc_1.2.0
>
More information about the Bioconductor
mailing list