[Bioc-devel] any interest in a BiocMatrix core package?

Michael Lawrence lawrence.michael at gene.com
Fri Mar 3 18:59:25 CET 2017


After reading the original post again, it seems that maybe Rcpp could solve
the problem (if it hasn't already) by implementing their matrix API on top
of dispatch to a few functions like [() and dim(). Both of those are
internally generic, so it would not touch R until the actual method call,
and wouldn't need to "see" any R-level S4 generics.

On Fri, Mar 3, 2017 at 9:29 AM, Michael Lawrence <michafla at gene.com> wrote:

> This is along the lines of what I suggested on the board phone call. If
> there is already a C++ library like Armadillo for doing the heavy lifting,
> it would be easy to implement an R-level abstraction on top of it, just as
> was done for HDF5, bigMemory, etc.
>
> On Fri, Mar 3, 2017 at 8:32 AM, McDavid, Andrew <Andrew_Mcdavid at urmc.
> rochester.edu> wrote:
>
>> On C++, Armadillo can be passed a a pointer to memory for the backing
>> store of its objects, so can use memory mapping.  On the R side, package
>> bigmemory provides R access and initialization of memory-mapped arrays.
>> See https://www.r-bloggers.com/using-rcpparmadillo-with-bigmemory/.
>> This doesn’t provide language or platform interchange of the backing store,
>> but would be an easy-ish solution.
>>
>> On Mar 3, 2017, at 10:23 AM, bioc-devel-request at r-project.org<mailto:
>> bioc-devel-request at r-project.org> wrote:
>>
>> Some comment on Aaron's stuff
>>
>> One possibility for doing things like this is if your code can be done in
>> C++ using a subset of rows or columns.  That can sometimes give the
>> necessary speed up.  What I mean is this
>>
>> Say you can safely process 1000 cells (not matrix cells, but biological
>> cells, aka columns) at a time in RAM
>>
>> iterate in R:
>>  get chunk i containing 1000 cells from the backend data storage
>>  do something on this sub matrix where everything is in a normal matrix
>> and you just use C++
>>  write results out to whatever backend you're using
>>
>> Then, with a million cells you iterate over 1000 chunks in R.  And you
>> don't need to "touch" the full dataset which can be stored on an arbitrary
>> backend.  And this approach could be run even (potentially) with different
>> chunks on different nodes.
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list