[Bioc-devel] A bug in TxDb.Hsapiens.UCSC.hg38.knownGene?

zhao shilin zhaoshilin at gmail.com
Mon Oct 17 05:35:44 CEST 2016


Dear BioC team,

I think I found something incorrect in  TxDb.Hsapiens.UCSC.hg38.knownGene,
and reported in https://support.bioconductor.org/p/88232/ but didn't get
reply. I think it is a bug, so decided to send it via email to let you know.

I am using the developing version of TxDb.Hsapiens.UCSC.hg38.knownGene,
because the release version is build in 2015 and has a lot of difference
with UCSC website. Here is the R code for the bug:

require(TxDb.Hsapiens.UCSC.hg38.knownGene)
require(GenomicRanges)

geneDb=TxDb.Hsapiens.UCSC.hg38.knownGene
allGeneRange<-genes(geneDb)
allGeneRange["875"]
txs <- transcriptsBy(TxDb.Hsapiens.UCSC.hg38.knownGene)
txs["875"]


We can find CBS gene (txs["875"]) has 25 transcripts, from two
regions: chr21   [6444869, 6467509] and chr21 [43075107, 43076288]

1. CBS gene ("875") was only in chr21 [43075107, 43076288]. The region
of chr21   [6444869, 6467509] was CBSL gene ("102724560"). But CBSL was not
in the database, and its transcripts were recorded in CBS.

2. The gene region of CBS gene (allGeneRange["875"]) was in chr21 [6444869,
43076943], which included all the region between 6444869-43076943. But it
is not correct as they were two separate regions.


Thanks!

Shilin

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list