[BioC] Querying a HDF5 database from R

James Mahon [guest] guest at bioconductor.org
Wed Jan 8 19:19:20 CET 2014


I have a database (22 GB) in SQLite that I query from R for numerical analysis. I'm considering converting the database to HDF5 for faster read times (because reading the population is slow). I have two questions about the rhdf5 package that I haven't been able to figure out from my own experimenting.

(i) Suppose that I save an R dataframe to a HDF file. Is it possible to read subsets of the dataframe based on variable names and variable values? Often, I don't won't to read the full dataframe into memory (~ 100 million observations and ~ 30 variables).

(ii) I frequently use indexes in my SQLite database to quickly join related tables. Does rhdf5 have a similar feature? If not, will converting to a HDF5 database create substantial bottlenecks if I rely on these joins frequently?

Thanks so much for your help.

 -- output of sessionInfo(): 


Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list