[Bioc-devel] Multiple colData in SummarizedExperiment

Vincent Carey stvjc at channing.harvard.edu
Thu Jun 18 15:34:19 CEST 2015


yes, if a formal extension is warranted.  the metadata slot could also be
used.

On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:

> I think the more clean solution for Davide (if he inists on having separate
> objects; I decided against it in minfi) is to extend the class to allow
> this.
>
> Kasper
>
> On Thu, Jun 18, 2015 at 12:25 AM, Ryan <rct at thompsonclan.org> wrote:
>
> > Oh wow, I didn't know you could put a DataFrame into a single column of
> > another DataFrame. That actually solves a problem for me too (I don't
> > intend to expose nested DataFrames to the users though).
> >
> >
> > On 6/17/15 7:23 PM, Martin Morgan wrote:
> >
> >> On 06/17/2015 11:41 AM, davide risso wrote:
> >>
> >>> Dear list,
> >>>
> >>> I'm creating an R package to store RNA-seq data of a somewhat large
> >>> project
> >>> in which I'm involved.
> >>>
> >>> One of the initial goals is to compare different pre-processing
> >>> pipelines,
> >>> hence I have multiple expression matrices corresponding to the same
> >>> samples.
> >>> The SummarizedExperiment class seems a good candidate, since I have
> >>> multiple expression matrices with the same rowData and colData
> >>> information.
> >>>
> >>> I have several sample-specific variables that I want to store with the
> >>> object, namely, experimental information (e.g., batch, date,
> experimental
> >>> condition, ...) and sample quality (e.g., proportion of aligned reads,
> >>> total duplicate reads, etc...).
> >>>
> >>> Of course, I can always create one big data frame concatenating the two
> >>> (experimental info + sample quality), but it seems that both
> conceptually
> >>> and practically, it might be useful to have two separate data frames.
> >>> Since this seems somewhat a reasonably standard type of information
> that
> >>> one would want to carry on, I was wondering if it would be possible /
> >>> useful to allow the user to have multiple data.frames in the colData
> slot
> >>>
> >>
> >> Actually, colData() is a DataFrame, and a DataFrame column can contain a
> >> DataFrame. So after
> >>
> >>   example(SummarizedExperiment)
> >>
> >> we could make some faux sample quality data
> >>
> >>   quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))
> >>
> >> add this as a column in the colData()
> >>
> >>   colData(se1)$quality = quality
> >>
> >> (or create the SummarizedExperiment from a similar DataFrame up-front)
> >> and manage our grouped data
> >>
> >> > colData(se1)
> >> DataFrame with 6 rows and 2 columns
> >>     Treatment     quality
> >>   <character> <DataFrame>
> >> A        ChIP    ########
> >> B       Input    ########
> >> C        ChIP    ########
> >> D       Input    ########
> >> E        ChIP    ########
> >> F       Input    ########
> >> > colData(se1[,1:2])$quality
> >> DataFrame with 2 rows and 2 columns
> >>           x         y
> >>   <integer> <integer>
> >> A         1         6
> >> B         2         5
> >>
> >> I'm not sure that this is any less confusing to the end user than having
> >> to manage a DataFrameList(), but it does not require any new features.
> >>
> >> Martin
> >>
> >>  of SummarizedExperiment.
> >>>
> >>> Best,
> >>> Davide
> >>>
> >>>     [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> Bioc-devel at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>
> >>>
> >>
> >>
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list