[Bioc-devel] biomaRt and TxDb don't play nice together
Michael Shapiro
@||k@ @end|ng |rom e@rth||nk@net
Thu Oct 3 12:45:17 CEST 2019
Apologies for a previous email that seems content free.
I've run into a cosmic mis-match between biomaRt and TxDb which is either a bug or a bug waiting to happen. In brief, biomaRt reports entrezgene_id as a numeric, but TxDb wants it as a character. What's deadly in this is that TxDb doesn't fail from being supplied with the numeric, it simply accesses the wrong gene. Here is a minimal example where I am trying to get from gene name (Kcnj12) to gene location:
## Resolve the gene name:
ensembl = useMart('ensembl', dataset='mmusculus_gene_ensembl')
geneNames=getBM(c('entrezgene_id', 'external_gene_name'), mart= ensembl)
idx = geneNames$external_gene_name == 'Kcnj12'
entrezGeneId = geneNames$entrezgene_id[idx]
## Get gene locations:
txdb = TxDb.Mmusculus.UCSC.mm10.knownGene
tbg = transcriptsBy(txdb,by='gene')
## Shoot self in foot:
WRONG_LOCATION = tbg[[entrezGeneId]]
## Get email from biologist pointing out you've got the wrong gene:
ACTUAL_LOCATION = tbg[[as.character(entrezGeneId)]]
I would argue that if entrezgene_id is used in some places as a numeric and others as a character, it's safer if biomaRt returns it as a character. If your code is wrong, you want it to fail, not quietly mis-perform. A vector or list will always let you access it using a numeric even when this is wrong. You will probably get an error if you try to access something with a character when you should be using a numeric.
More information about the Bioc-devel
mailing list