[BioC] Bug in makeOrgPackageFromNCBI from AnnotationForge?

Sat Aug 24 04:24:30 CEST 2013

I am working on a project involving Schizosaccharomyces pombe as a source for genomic analysis and love to use ReportingTools html producing wrappers. However, I am struggling as there is no AnnotationDbi package available for this organism. I decided to finally take the plunge and try to see if I could be one myself using AnnotationForge and was quite exciting to find that I could perhaps melt one simply by using the makeOrgPackageFromNCBI(). Most likely, something went wrong and I suspect a bug somewhere in the pipeline. I have not dug deeper then trying to build the package and use it hoping that someone closer to the code could shed some lights. Here the steps I took:'

> library(AnnotationForge)
> makeOrgPackageFromNCBI(version = "0.1",                                                                                                                                                                                                                  
                       author = "Marco Blanchette <mab at stowers.org>",                                                                                                                                                                                    
                       maintainer = "Marco Blanchette <mab at stowers.org>",                                                                                                                                                                                
                       outputDir = ".",                                                                                                                                                                                                                  
                       tax_id = "4896",                                                                                                                                                                                                                  
                       genus = "Schizosaccharomyces",                                                                                                                                                                                                    
                       species = "pombe")

This step succeeded with only a warning:

Warning message:
In .makeSimpleTable(ug, table = "unigene", con) :
  no values found for table unigene in this data chunk.

I didn't think this was critical enough to raise any red flag, so I then proceeded with the installation that went smoothly

> library(devtools)
> install('org.Spombe.eg.db')
> library('org.Spombe.eg.db')

Then I try to use it with ReportingTools publish() but fail as it returns an error related to Entrez ID which I had a conversion table from biomaRt. I dug a bit deeper and found that none of the genes I was querying were in the database to finally realize that there was only 38 entries int the org.Spombe.eg.db database I had just created and installed... Check this out:

> keytypes(org.Spombe.eg.db)
 [1] "ENTREZID" "ACCNUM"   "ALIAS"    "CHR"      "PMID"     "REFSEQ"  
 [7] "SYMBOL"   "UNIGENE"  "GENENAME" "GO"       "EVIDENCE" "ONTOLOGY"

Looking good! However:

> length(keys(org.Spombe.eg.db,'ENTREZID'))
[1] 38

Can someone close enough to the code shed some lights has to whether there is a bug in AnnotationForge or whether it is the NCBI database that is not conforming to what is expected? For instance, biomart has 5117 entrez ID

> library(biomaRt)
> mart <- useMart("fungi_mart_18","spombe_eg_gene")
> ensembl2entrez <- getBM(c('ensembl_gene_id','entrezgene'),mart=mart)
> sum(!is.na(ensembl2entrez$entrezgene))
[1] 5117

The ids I tested on the NCBI website return the correct genes. However, only 10 of the AnnotationForge EntrezID (out of a skirmish 38 ids) are found in biomaRt

> sum(keys(org.Spombe.eg.db,'ENTREZID') %in% ensembl2entrez$entrezgene)
[1] 10

Again, I would appreciate any comments or suggestions as to whether this is a bug or something I did wrong or a miss alignment between the NCBI S. pombe annotation and what is expected by AnnotationForge.

Thanks
-- 
Marco Blanchette, Ph.D.
Assistant Investigator 
Stowers Institute for Medical Research
1000 East 50th St.

Kansas City, MO 64110

Tel: 816-926-4071 
Cell: 816-726-8419 
Fax: 816-926-2018