[Bioc-devel] any interest in a BiocMatrix core package?

McDavid, Andrew Andrew_Mcdavid at URMC.Rochester.edu
Fri Mar 3 17:32:37 CET 2017


On C++, Armadillo can be passed a a pointer to memory for the backing store of its objects, so can use memory mapping.  On the R side, package bigmemory provides R access and initialization of memory-mapped arrays.  See https://www.r-bloggers.com/using-rcpparmadillo-with-bigmemory/.  This doesn’t provide language or platform interchange of the backing store, but would be an easy-ish solution.

On Mar 3, 2017, at 10:23 AM, bioc-devel-request at r-project.org<mailto:bioc-devel-request at r-project.org> wrote:

Some comment on Aaron's stuff

One possibility for doing things like this is if your code can be done in
C++ using a subset of rows or columns.  That can sometimes give the
necessary speed up.  What I mean is this

Say you can safely process 1000 cells (not matrix cells, but biological
cells, aka columns) at a time in RAM

iterate in R:
 get chunk i containing 1000 cells from the backend data storage
 do something on this sub matrix where everything is in a normal matrix
and you just use C++
 write results out to whatever backend you're using

Then, with a million cells you iterate over 1000 chunks in R.  And you
don't need to "touch" the full dataset which can be stored on an arbitrary
backend.  And this approach could be run even (potentially) with different
chunks on different nodes.


	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list