[Bioc-devel] Question about which new organism resources to create
Luo Weijun
luo_weijun at yahoo.com
Tue May 6 21:01:37 CEST 2014
Marc,
This sounds like a great resource, and could help make Bioconductor more useful!
As for what species to include, I would suggest to check the full list of KEGG species:
http://www.genome.jp/kegg/catalog/org_list.html
these are all complete genomes, hence should be generally more relevant species compare to those without complete genomes. Hopefully, many of them are well annotated. At least, the pathway annotations are easily available.
Just my 2 cents,
Weijun
--------------------------------------------
On Tue, 5/6/14, Marc Carlson <mcarlson at fhcrc.org> wrote:
Subject: [Bioc-devel] Question about which new organism resources to create
To: "bioc-devel at r-project.org" <bioc-devel at r-project.org>
Date: Tuesday, May 6, 2014, 1:14 PM
Hi everyone,
As many of you already know we have long provided organism
annnotation packages that give gene based annotations for
selected organisms. And we intend to keep doing
that. But these days there is also a lot of other data
at NCBI that could be used to make gene based databases for
other organisms. And at the same time, there is also
greater and greater demand for annotations from other
organisms too. So I aim to make organism based gene
databases for a wider range of organisms. However
instead of just making more packages, I intend to put these
DBs into the AnnotationHub. You can get an idea about
what access will be like by looking at the inparanoid8
objects that were put in for the last release.
library(AnnotationHub)
ah = AnnotationHub()
hs8 = ah$inparanoid8.Orthologs.hom.Homo_sapiens.inp8.sqlite
hs8
columns(hs8)
k = head(keys(hs8, 'TOXOPLASMA_GONDII'))
select(hs8, k, 'HOMO_SAPIENS', 'TOXOPLASMA_GONDII')
## etc.
Anyhow my reason for posting is that I am now looking at all
the NCBI data that could be used for annotation packages and
trying to decide what to include. About half of the 14
thousand potential critters in the NCBI dataset only have
about one gene annotated. I am guessing that it is not
worth anyone's time to pre-process those organisms that have
only one gene. Or is it? If you think it might
be, now would probably be a good time to speak up.
How many annotations do you guys want/expect in an organism
package before it becomes annoying that you even downloaded
it?
Thanks in advance for your opinions,
Marc
_______________________________________________
Bioc-devel at r-project.org
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list