[Bioc-devel] SummarizedExperiment subset of 4 dimensions

Michael Lawrence lawrence.michael at gene.com
Tue Mar 31 13:32:47 CEST 2015


Taken in the abstract, the tidy data argument is one for consistent data
structures that enable interoperability, which is what we have with
SummarizedExperiment. The "long form" or "tidy" data frame is an effective
general representation, but if there is additional structure in your data,
why not represent it formally? Given the way R lays out the data in arrays,
it should be possible to add that fourth dimension, in an assay array,
while still using the colData to annotate that structure. It does not make
the data any less "tidy", but it does make it more structured.

On Tue, Mar 31, 2015 at 4:14 AM, Wolfgang Huber <whuber at embl.de> wrote:

> Dear Jesper
>
> this is maybe not the answer you want to hear, but stuffing in 4, 5, …
> dimensions may not be all that useful, as you can always roll out these
> higher dimensions into the existing third (or even into the second, the
> SummarizedExperiment columns). There is Hadley’s concept of “tidy data”
> (see e.g. http://www.jstatsoft.org/v59/i10 ) — a paper that is really
> worthwhile to read — which implies that the tidy way forward is to stay
> with 2 (or maybe 3) dimensions in SummarizedExperiment, and to record the
> information that you’d otherwise stuff into the higher dimensions in the
> colData covariates.
>
> Wolfgang
>
> Wolfgang Huber
> Principal Investigator, EMBL Senior Scientist
> Genome Biology Unit
> European Molecular Biology Laboratory (EMBL)
> Heidelberg, Germany
>
> T +49-6221-3878823
> wolfgang.huber at embl.de
> http://www.huber.embl.de
>
>
>
>
>
> > On 30 Mar 2015, at 12:38, Jesper Gådin <jesper.gadin at gmail.com> wrote:
> >
> > Hi!
> >
> > The SummarizedExperiment class is an extremely powerful container for
> > biological data(thank you!), and all my thinking nowadays is just
> circling
> > around how to stuff it as effectively as possible.
> >
> > Have been using 3 dimension for a long time, which has been very
> > successful. Now I also have a case for using 4 dimensions. Everything
> > seemed to work as expected until I tried to subset my object, see
> example.
> >
> > library(GenomicRanges)
> >
> > rowRanges <- GRanges(
> >                seqnames="chrx",
> >                ranges=IRanges(start=1:3,end=4:6),
> >                strand="*"
> >                )
> >
> > coldata <- DataFrame(row.names=paste("s",1:3, sep=""))
> >
> > assays <- SimpleList()
> >
> > #two dim
> > assays[["dim2"]] <- array(0,dim=c(3,3))
> > se <- SummarizedExperiment(assays, rowRanges = rowRanges,
> colData=coldata)
> > se[1]
> > #works
> >
> > #three dim
> > assays[["dim3"]] <- array(0,dim=c(3,3,3))
> > se <- SummarizedExperiment(assays, rowRanges = rowRanges,
> colData=coldata)
> > se[1]
> > #works
> >
> > #four dim
> > assays[["dim4"]] <- array(0,dim=c(3,3,3,3))
> > se <- SummarizedExperiment(assays, rowRanges = rowRanges,
> colData=coldata)
> > se[1]
> > #does not work
> > #Error in x[i, , , drop = FALSE] : incorrect number of dimensions
> >
> > This is also the case for rbind and cbind. Would it be appropriate to ask
> > you to update the SE functions to handle subset, rbind, cbind also for 4
> > dimensions? I know the time for next release is very soon, so maybe it is
> > better to wait until after April 16. Just let me know your thoughts about
> > it.
> >
> > Jesper
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list