[Bioc-devel] Incrimental writing to HDF5 / DelayedMatrix

Paul Theodor Pyl paul.theodor.pyl at embl.de
Thu Dec 21 13:01:21 CET 2017

Hi Francesco,

this is certainly achievable with currently available HDF5 support in R/Bioconductor. For example the rhdf5 package gives you access to this functionality (https://bioconductor.org/packages/release/bioc/html/rhdf5.html (https://bioconductor.org/packages/release/bioc/html/rhdf5.html)).

rhdf5 is relatively 'low-level', in the sense that it is really close to the HDF5 library it exposes to R (i.e. you get h5read an h5write functions). For what you are describing I typically use a small wrapper to make my life a bit easier, I have something like that on github here: https://github.com/PaulPyl/h5array (https://github.com/PaulPyl/h5array)

Please note that this is not an official Bioconductor package so it doesn't fulfill the strict standards of documentation etc., since it is just a small wrapper to give you an array-like object that writes/reads its data from disk though, it should be fairly straightforward to use.


On Thu, Dec 21, 2017 at 12:22, Francesco Napolitano  wrote:

I need to deal with very large matrices and I was thinking of using
HDF5-based data models. However, from the documentation and examples
that I have been looking at, I'm not quite sure how to do this.

My use case is as follows.
I want to build a very large matrix one column at a time, and I need
to write columns directly to disk since I would otherwise run out of
memory. I need a format that, afterwards, will allow me to extract
subsets of rows or columns and rank them. The subsets will be small
enough to be loaded in memory. Can I achieve this with current HDF5
support in R?

Any help greatly appreciated.

than you,

Bioc-devel at r-project.org (mailto:Bioc-devel at r-project.org) mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel (https://stat.ethz.ch/mailman/listinfo/bioc-devel)

	[[alternative HTML version deleted]]

More information about the Bioc-devel mailing list