[BioC] org.Bt.eg.db / annotation problem
Iain Gallagher
iaingallagher at btopenworld.com
Tue Aug 2 14:56:07 CEST 2011
Hello List
Perhaps someone could help me with this. I am annotating some 200 genes with the org.Bt.eg.db package. The identifier I have for the genes is and Ensembl ID (e.g. ENSBTAG00000009012). I am attempting to return the ENTREZ id with the following code (where rownames(topGenes$table) is my vector of Ensembl IDs):
egIds <- unlist(mget(rownames(topGenes$table), org.Bt.egENSEMBL2EG, ifnotfound=NA))
This returns a named vector but it contains Ensembl IDs that were not in my query.
setdiff(names(egIds), rownames(topGenes$table))
[1] "ENSBTAG000000375581" "ENSBTAG000000375582" "ENSBTAG000000312311"
[4] "ENSBTAG000000312312" "ENSBTAG000000306301" "ENSBTAG000000306302"
[7] "ENSBTAG000000359951" "ENSBTAG000000359952" "ENSBTAG000000005461"
[10] "ENSBTAG000000005462" "ENSBTAG000000005041" "ENSBTAG000000005042"
[13] "ENSBTAG000000307771" "ENSBTAG000000307772" "ENSBTAG000000135691"
[16] "ENSBTAG000000135692"
Could someone explain why this is happening? The IDs above (i.e. those not in my query are returned with Entrez IDs).
egIds[setdiff(names(egIds), rownames(topGenes$table))]
ENSBTAG000000375581
"281212"
etc etc
Thanks
iain
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Bt.eg.db_2.5.0 RSQLite_0.9-4 DBI_0.2-5
[4] AnnotationDbi_1.14.1 Biobase_2.10.0 edgeR_2.2.5
loaded via a namespace (and not attached):
[1] limma_3.6.6 tools_2.13.1
>
More information about the Bioconductor
mailing list