[Bioc-devel] Update on SQLite-based annotation data package (prototype available)
sfalcon at fhcrc.org
Thu Feb 1 17:42:31 CET 2007
Wolfgang Huber <huber at ebi.ac.uk> writes:
> Hi Seth,
> I installed the package, but I get:
>> ? hgu95av2db
> No documentation for 'hgu95av2db' in specified packages and libraries:
> you could try 'help.search("hgu95av2db")'
> No documentation for 'getDb' in specified packages and libraries:
> you could try 'help.search("getDb")'
> No documentation for 'hgu95av2CHRLOC' in specified packages and libraries:
> you could try 'help.search("hgu95av2CHRLOC")'
> and there is also no vignette
Yep. It really is a prototype. To get started, try pretending you
have called library(hgu95av2). IOW, you should have all the same
"environments" (in quotes because now they are S4 instances) and can
treat them as such.
We will put some documentation together for the experimental APIs we
are working on, but things are in flux. Herve has a vignette like
document that we will post asap.
Some notes on performance are worth noting... The database approach
is going to be slower than having everything in memory for many
operations. When retrieving annotation for reasonably small gene
lists, the difference is not huge. However, for operations that pull
everything from a given mapping, such as as.list(), you will see a
So why are the SQLite-based packages a good thing? Here are some
1. They will allow us to deal with much larger data collections.
The environment-based packages require being able to have all of
the data in memory at once and provide no easy way to unload the
data once it has been loaded. The SQLite-based packages can
easily handle much larger data sizes and pull only the requested
data into memory at any one time.
2. More flexible queries. With the SQLite-based packages, many
queries that currently require loops over possible many entire
environments can be accomplished in one statement. Using some
simple SQL statements, I've been able to improve the performance
of the hyperGTest function by 10x. Focused queries will
generally be much faster with the SQLite-based packages.
> R version 2.5.0 Under development (unstable) (2007-01-22 r40543)
> attached base packages:
>  "tools" "stats" "graphics" "grDevices" "utils" "datasets"
>  "methods" "base"
> other attached packages:
> hgu95av2db AnnotationDbi RSQLite DBI Biobase
> "1.13.91" "0.0.41" "0.4-19" "0.1-12" "1.13.34"
>> We are making progress on converting the annotation data packages to
>> use SQLite as the backend storage mechanism.
>> The devel annotation package repository has a prototype of a
>> SQLite-based annotation data package (hgu95av2db). If you are running
>> R-devel, then you should be able to install it via biocLite (sorry,
>> only source package at this point).
>> The SQLite-based annotation packages depend on the AnnotationDbi
>> package which provides an environment-like interface that should be
>> backwards compatible. Advanced users can get a connection to the DB
>> and issue raw SQL queries. We are also planning to provide more
>> convenience/accessor functions along the lines of the annotate
>> Our plan for the upcoming 2.0 release of Bioconductor is to include
>> both environment-based and SQLite-based annotation packages.
>> If you maintain a package that makes use of annotation data packages,
>> it would be good to see if the hgu95av2db prototype will work with
>> your code (if not, please let us know).
>> + seth
>> Bioc-devel at stat.math.ethz.ch mailing list
> Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
More information about the Bioc-devel