[R-sig-DB] help on deciding which open-source database to use with R
Mark W Kimpel
mwk|mpe| @end|ng |rom gm@||@com
Sat May 12 03:43:48 CEST 2007
I need to store and access increasingly large numbers of microarray
datasets, which I analyze with R and BioC packages, and have decided to
delve into the world of relational databases. For those of you
unfamiliar with microarray datasets, they consist of unique indentifiers
with associated raw and summary data. The unique identifiers map to
established gene annotations that are updated regularly, a key reason I
would like to use a relational database.
In addition to just storing results, I would also like the database to
perform SQL queries as well as use R within the database itself, so
that, for example, an FDR calculation could be done on a geneset that
was selected using various criteria from a web front-end without
explicitly invoking R (I think postgreSQL can do that). Finally, I would
like the database to be open-source and run on Linux.
Here is what I have gathered from perusing reviews of databases and the
R mailing lists and the cran and BioC repositories:
1. my top 3 choices would be MySQL, postgreSQL, and SQLite
2. postgreSQL is probably the most powerful and ideologically "pure"
3. MySQL has the largest user community and the most available books
4. SQLite is the easiest to set up and R from within R
5. there are several R packages for SQLite that assist with very routine
things like storing dataframes
6. DBI and RMySQL seem to offer the most combined active development and
power from cran and RdbiPgSQL and postgreSQL would be an analogous
offering from BioC
7. RODBC would allow me to use just about any of the databases as well
as Excel
For all that "understanding", there is so much I can't figure out just
from reading disparate sources. In particular, in am concerned about: 1.
level of documentation so that I can learn, 2. likelihood of continued
support and development, 3. ability to satisfy my present (as outlined
in the first para) and unanticipated future needs, and 4. ease of use.
Would someone who is familiar with these databases and how they "relate"
(pun intended) to the R and BioC communities compare and contrast them
for me?
Thanks,
Mark
---
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 663-0513 Home (no voice mail please)
More information about the R-sig-DB
mailing list