[Bioc-devel] transitioning scater/scran to SingleCellExperiment

Aaron Lun alun at wehi.edu.au
Tue Aug 8 10:32:10 CEST 2017


>     I guess this would be a question for the
>     SummarizedExperiment developers, though personally, I never liked
>     ExpressionSet's inclination to slap names on everything.
> 
> Too bad we’re bound to SummarizedExperiment’s “rows” and “cols”. Since 
> they always refer to features and samples, respectively: Why not name 
> them that?
> 
> There’s already too many APIs in too many programming languages that 
> confusingly have one or the other convention – if whe know which is 
> which, why not name them after that knowledge?

*shrug* + *meh*. As I said, I'm the wrong person to complain to about 
this. Though I don't have particularly strong feelings either way.

>     It probably wouldn't be a good idea to store distances as expression
>     matrices. However, if there is a need for it, we can add a new slot
>     for distance matrices. I think SC3 has a similar requirement, so
>     perhaps this would be more generally useful than I first thought.
>     You can post an issue on the github repository to remind Davide or
>     me to do it.
> 
> Distance matrices (cell×cell) can’t only come from cell×gene matrices. 
> You can e.g. use dynamic time warping to create them from cell×gene×time 
> arrays.

I don't think there's direct support for >2-dimensional arrays in SE 
objects. You might be able to put them in, but I don't know how well it 
will interact with the subsetting machinery. One solution is to split it 
up by the third dimension and store each matrix as a separate assay.

In any case, a distance matrix calculated from such an array would be 
fine, as long as the dimensions are equal to the number of cells. The 
question is whether it is needed by enough packages to warrant a slot in 
the base SCE class; I will discuss this with Davide and Vlad.

>     Finally, I'm not sure what advantages those ergonomics provide.
>     Indeed, if every package defines its own plot() S4 method for
>     SingleCellExperiment, they will clobber each other in the dispatch
>     table, resulting in some interesting results dependent on package
>     loading order. If you have destiny-specific data and methods, best
>     to keep them separate rather than stuffing them into the SCE object.
> 
> I wrote that I could e.g. create a plot_dm method, which plots a 
> diffusion map stored in a SCE.
> 
> Also I didn’t mean the plot method with ergonomics. I meant |fortify|, 
> |names|, |$|, and |[[|. Those would be very useful, as you could just do 
> things like the following, and have autocompletion:
> 
> sce$Predicate1 <- sce$SampleMeta1 > 40# `$` accesses counts (by gene) 
> and rowData. `$<-` sets rowData
> qplot(Gene1, Gene2, colour = Predicate1, data = sce) # fortify creates a 
> data.frame containing cbind(t(counts), rowData)

The SingleCellExperiment package makes no statement on whether 
downstream users/packages want to (or not) use the tidy-verse or 
ggplot2. It simply provides the minimal class and methods; convenience 
wrappers are left to the discretion of each package developer. scater, 
for example, implements a few dplyr verbs for SCE objects.

Cheers,

Aaron


More information about the Bioc-devel mailing list