[BioC] Querying a HDF5 database from R
James Mahon [guest]
guest at bioconductor.org
Wed Jan 8 19:19:20 CET 2014
I have a database (22 GB) in SQLite that I query from R for numerical analysis. I'm considering converting the database to HDF5 for faster read times (because reading the population is slow). I have two questions about the rhdf5 package that I haven't been able to figure out from my own experimenting.
(i) Suppose that I save an R dataframe to a HDF file. Is it possible to read subsets of the dataframe based on variable names and variable values? Often, I don't won't to read the full dataframe into memory (~ 100 million observations and ~ 30 variables).
(ii) I frequently use indexes in my SQLite database to quickly join related tables. Does rhdf5 have a similar feature? If not, will converting to a HDF5 database create substantial bottlenecks if I rely on these joins frequently?
Thanks so much for your help.
-- output of sessionInfo():
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor