[Bioc-devel] any interest in a BiocMatrix core package?

Vincent Carey stvjc at channing.harvard.edu
Fri Feb 24 23:25:07 CET 2017


What is the data type for an expression value?  Is it assumed that double
precision will be needed?

On Fri, Feb 24, 2017 at 4:50 PM, Aaron Lun <alun at wehi.edu.au> wrote:

> It's a good place to start, though it would be very handy to have a C(++)
> API that can be linked against. I'm not sure how much work that would
> entail but it would give downstream developers a lot more options. Sort of
> like how we can link to Rhtslib, which speeds up a lot of BAM file
> processing, instead of just relying on Rsamtools.
>
>
> -Aaron
>
> ________________________________
> From: Tim Triche, Jr. <tim.triche at gmail.com>
> Sent: Saturday, 25 February 2017 8:34:58 AM
> To: Aaron Lun
> Cc: bioc-devel at r-project.org
> Subject: Re: [Bioc-devel] any interest in a BiocMatrix core package?
>
> yes
>
> the DelayedArray framework that handles HDF5Array, etc. seems like the
> right choice?
>
> --t
>
> On Fri, Feb 24, 2017 at 1:26 PM, Aaron Lun <alun at wehi.edu.au<mailto:alun@
> wehi.edu.au>> wrote:
> Hi everyone,
>
> I just attended the Human Cell Atlas meeting in Stanford, and people were
> talking about gene expression matrices for >1 million cells. If we assume
> that we can get non-zero expression profiles for ~5000 genes, we’d be
> talking about a 5000 x 1 million matrix for the raw count data. This would
> be 20-40 GB in size, which would clearly benefit from sparse (via Matrix)
> or disk-backed representations (bigmatrix, BufferedMatrix, rhdf5, etc.).
>
> I’m wondering whether there is any appetite amongst us for making a
> consistent BioC API to handle these matrices, sort of like what
> BiocParallel does for multicore and snow. It goes without saying that the
> different matrix representations should have consistent functions at the R
> level (rbind/cbind, etc.) but it would also be nice to have an integrated
> C/C++ API (accessible via LinkedTo). There’s many non-trivial things that
> can be done with this type of data, and it is often faster and more memory
> efficient to do these complex operations in compiled code.
>
> I was thinking of something that you could supply any supported matrix
> representation to a registered function via .Call; the C++ constructor
> would recognise the type of matrix during class instantiation; and
> operations (row/column/random read access, also possibly various ways of
> writing a matrix) would be overloaded and behave as required for the class.
> Only the implementation of the API would need to care about the nitty
> gritty of each representation, and we would all be free to write code that
> actually does the interesting analytical stuff.
>
> Anyway, just throwing some thoughts out there. Any comments appreciated.
>
> Cheers,
>
> Aaron
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioc-devel at r-project.org<mailto:Bioc-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list