[BioC] org.Bt.eg.db / annotation problem

Tue Aug 2 14:56:07 CEST 2011

Hello List

Perhaps someone could help me with this. I am annotating some 200 genes with the org.Bt.eg.db package. The identifier I have for the genes is and Ensembl ID (e.g. ENSBTAG00000009012). I am attempting to return the ENTREZ id with the following code (where rownames(topGenes$table) is my vector of Ensembl IDs):

egIds <- unlist(mget(rownames(topGenes$table), org.Bt.egENSEMBL2EG, ifnotfound=NA))

This returns a named vector but it contains Ensembl IDs that were not in my query.

setdiff(names(egIds), rownames(topGenes$table))

 [1] "ENSBTAG000000375581" "ENSBTAG000000375582" "ENSBTAG000000312311"
 [4] "ENSBTAG000000312312" "ENSBTAG000000306301" "ENSBTAG000000306302"
 [7] "ENSBTAG000000359951" "ENSBTAG000000359952" "ENSBTAG000000005461"
[10] "ENSBTAG000000005462" "ENSBTAG000000005041" "ENSBTAG000000005042"
[13] "ENSBTAG000000307771" "ENSBTAG000000307772" "ENSBTAG000000135691"
[16] "ENSBTAG000000135692"

Could someone explain why this is happening? The IDs above (i.e. those not in my query are returned with Entrez IDs).

egIds[setdiff(names(egIds), rownames(topGenes$table))]

ENSBTAG000000375581 
           "281212"    

etc etc

Thanks

iain

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8    
 [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8   
 [7] LC_PAPER=en_GB.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Bt.eg.db_2.5.0   RSQLite_0.9-4        DBI_0.2-5           
[4] AnnotationDbi_1.14.1 Biobase_2.10.0       edgeR_2.2.5         

loaded via a namespace (and not attached):
[1] limma_3.6.6  tools_2.13.1
>