[Bioc-devel] Question about which new organism resources to create

Marc Carlson mcarlson at fhcrc.org
Tue May 6 19:14:27 CEST 2014


Hi everyone,

As many of you already know we have long provided organism annnotation 
packages that give gene based annotations for selected organisms.  And 
we intend to keep doing that.  But these days there is also a lot of 
other data at NCBI that could be used to make gene based databases for 
other organisms.  And at the same time, there is also greater and 
greater demand for annotations from other organisms too.  So I aim to 
make organism based gene databases for a wider range of organisms.  
However instead of just making more packages, I intend to put these DBs 
into the AnnotationHub.  You can get an idea about what access will be 
like by looking at the inparanoid8 objects that were put in for the last 
release.

library(AnnotationHub)
ah = AnnotationHub()
hs8 = ah$inparanoid8.Orthologs.hom.Homo_sapiens.inp8.sqlite
hs8
columns(hs8)
k = head(keys(hs8, 'TOXOPLASMA_GONDII'))
select(hs8, k, 'HOMO_SAPIENS', 'TOXOPLASMA_GONDII')
## etc.

Anyhow my reason for posting is that I am now looking at all the NCBI 
data that could be used for annotation packages and trying to decide 
what to include.  About half of the 14 thousand potential critters in 
the NCBI dataset only have about one gene annotated.  I am guessing that 
it is not worth anyone's time to pre-process those organisms that have 
only one gene.  Or is it?  If you think it might be, now would probably 
be a good time to speak up.

How many annotations do you guys want/expect in an organism package 
before it becomes annoying that you even downloaded it?

Thanks in advance for your opinions,


   Marc



More information about the Bioc-devel mailing list