[Bioc-devel] transitioning scater/scran to SingleCellExperiment

Angerer, Philipp philipp.angerer at helmholtz-muenchen.de
Mon Aug 7 18:30:41 CEST 2017


As I told you at the HCA hackathon, I’m interested in switching over destiny! I think the class is a really cool idea and seems very well thought out. 

Interestingly the design decisions coverged very well with [ https://github.com/theislab/scanpy#readme | scanpy ] ’s [ https://www.pydoc.io/pypi/scanpy-0.2.3/autoapi/data_structs/ann_data/index.html#data_structs.ann_data.AnnData | AnnData ] class that I helped Alex design. Scanpy makes heavy use of HDF5 serialization. I think we should quickly converge on a serialization format (keys and so on) so that AnnData and SingleCellExperiment can have interoperability via HDF5! 

The only point of criticism is that you, while staying specific to single cell data, named the dimensions “rows” and “columns” instead of e.g. “samples” and “variables”. Alex and me came to the conclusion that ExpressionSet ’s way of returning a named vector for dims is a good idea, and having the dimensions named for their roles reduces confusion. 

I have “two” “questions” regarding destiny, with some feature requests hiding in the second one: 1. 

destiny accepts either an expression matrix or a distance matrix (both with optional metadata). 

Currently the signature is this: 
DiffusionMap(data     = ExpressionSet | data.frame | matrix | Matrix,
             distance = NULL | "euclidean" | "cosine" | "rankcor" ) DiffusionMap(data     = NULL | data.frame , # Metadata distance = matrix | dist | symmetricMatrix) 

The idea is that both when providing expressions and when providing a distance matrix, you should be able to provide metadata. I’m not super happy with my approach, since the current methods of providing metadata differ. 

However, ExpressionSet and SingleCellExperiment are both specific for expression data. I think neither can hold dist objects as data. 

Is it valid and a good idea to neither store counts not exprs, but e.g. SingleCellExperiment(assays = list(dists = some_mat)) ? It wouldn’t be sliced properly, for example, and it being symmetric would mean that column and row metadata is the same… 

Is it a good idea to require assays to have certain names (e.g. “exprs” or “dists” here)? 2. 

The reducedDim methods would be able to store and retrieve diffusion components in a SingleCellExperiment , while destiny’s dataset method stores the original data used to create a DiffusionMap . 

What do you think is the best approach? Just conversions between the two classes? Or also deprecate DiffusionMap objects and create a diffusion_map function that returns a SingleCellExperiment object with the reduced dimensions and all the necessary metadata for further methods like e.g. DPT? 

I think for the latter, SingleCellExperiment isn’t quite cool enough yet :P. I’d like to have the full ergonomics of DiffusionMap : 

    * A names method (returning gene and per-cell-metadata names) 
    * Gene/per-cell-metadata access by $ and [[ . 
    * A fortify method that makes everything available in ggplot2. (E.g. ggplot(dm, aes(DC1, DC2, colour = Condition)) works!) 

I can do without the remaining methods (or provide them in destiny), as they are are neither general purpose enough for SingleCellExperiment nor really necessary, e.g. I can add an alias plot(a_dm_object) → plot_dm(a_sce_object) . 

Cheers, Philipp 
Von: "Aaron Lun" <alun at wehi.edu.au> 
An: "bioc-devel" <bioc-devel at r-project.org> 
Gesendet: Montag, 31. Juli 2017 10:38:03 
Betreff: Re: [Bioc-devel] transitioning scater/scran to SingleCellExperiment 

Dear developers, 

Both scater and scran will be migrating to the SingleCellExperiment 
class (https://bioconductor.org/packages/SingleCellExperiment) in the 
next BioC release. This is based on a SummarizedExperiment and provides 
a more modern user interface, as well as supporting different matrix 
representations (e.g., dgCMatrix, HDF5Matrix). 

We note that there are a number of Bioconductor packages that depend 
on/import/suggest scater or scran, which we have listed below: 


To the maintainers of these packages, we advise switching from SCESet to 
SingleCellExperiment as soon as possible; the former will be deprecated 
in the next release cycle. There are several things to note here: 

- The SCESet previously contained a number of slots relating to 
distances and clustering results. These are no longer present in the 
SingleCellExperiment, in line with the minimalist design philosophy of 
that package. If these are necessary, we suggest extending the 
SingleCellExperiment class in your own packages(*). 

- For packages that depend directly on methods in scater or scran, a 
number of methods have been removed. This aims to simplify the analysis 
workflow and code maintenance by reducing redundancy. Please ensure that 
your package does not need those missing methods by CHECKing it against 
the experimental versions(**) of these two packages: 


If there are any issues with the switch, please let us know and we will 
do our best to figure out the most appropriate fix. 


Aaron, Davis and Davide 

(*): If there is popular demand for some slots, we may consider 
including it in the base SingleCellExperiment object. 

(**): These versions are highly experimental and fluid, and results are 
likely to be unstable over the coming month. Nonetheless, if something 
is breaking, it is best that we know sooner rather than later. Or in 
other words, don't start complaining when it's close to release time. 
Bioc-devel at r-project.org mailing list 

Helmholtz Zentrum Muenchen

Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)

Ingolstaedter Landstr. 1

85764 Neuherberg


Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe

Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen

Registergericht: Amtsgericht Muenchen HRB 6466

USt-IdNr: DE 129521671

	[[alternative HTML version deleted]]

More information about the Bioc-devel mailing list