[Bioc-devel] Multiple colData in SummarizedExperiment

davide risso risso.davide at gmail.com
Thu Jun 18 21:44:29 CEST 2015


Thanks Kasper,

I think that's a good solution.

Best,
Davide
On Thu, Jun 18, 2015 at 11:51 AM Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:

> you can just implement this by having reserved column names in the colData
> slot; that will work and will take appr. 23 seconds to implement.  I agree
> it is not as clean from a design perspective, but you get 100% of the
> functionality and you can write a separate checker for the colData argument.
>
> On Thu, Jun 18, 2015 at 2:00 PM, davide risso <risso.davide at gmail.com>
> wrote:
>
>> Thank you all for the responses.
>>
>> I didn't think about the nested DataFrame solution.  It should work.
>> I agree that an extension might be cleaner, but I clearly need to give it
>> more thought.
>>
>> One of the reasons I wanted to have quality and metadata as separate
>> slots is that one could enforce that all the qualities are numeric, and
>> have a quality() method to extract just the quality scores (e.g., for
>> plotting / quality control). Having them in the same slot makes it harder
>> for the user to extract just the scores (if the column order and/or names
>> are not standardized).
>>
>> Best,
>> davide
>>
>>
>> On Thu, Jun 18, 2015 at 6:35 AM Vincent Carey <stvjc at channing.harvard.edu>
>> wrote:
>>
>>> yes, if a formal extension is warranted.  the metadata slot could also be
>>> used.
>>>
>>> On Thu, Jun 18, 2015 at 2:59 PM, Kasper Daniel Hansen <
>>> kasperdanielhansen at gmail.com> wrote:
>>>
>>> > I think the more clean solution for Davide (if he inists on having
>>> separate
>>> > objects; I decided against it in minfi) is to extend the class to allow
>>> > this.
>>> >
>>> > Kasper
>>> >
>>> > On Thu, Jun 18, 2015 at 12:25 AM, Ryan <rct at thompsonclan.org> wrote:
>>> >
>>> > > Oh wow, I didn't know you could put a DataFrame into a single column
>>> of
>>> > > another DataFrame. That actually solves a problem for me too (I don't
>>> > > intend to expose nested DataFrames to the users though).
>>> > >
>>> > >
>>> > > On 6/17/15 7:23 PM, Martin Morgan wrote:
>>> > >
>>> > >> On 06/17/2015 11:41 AM, davide risso wrote:
>>> > >>
>>> > >>> Dear list,
>>> > >>>
>>> > >>> I'm creating an R package to store RNA-seq data of a somewhat large
>>> > >>> project
>>> > >>> in which I'm involved.
>>> > >>>
>>> > >>> One of the initial goals is to compare different pre-processing
>>> > >>> pipelines,
>>> > >>> hence I have multiple expression matrices corresponding to the same
>>> > >>> samples.
>>> > >>> The SummarizedExperiment class seems a good candidate, since I have
>>> > >>> multiple expression matrices with the same rowData and colData
>>> > >>> information.
>>> > >>>
>>> > >>> I have several sample-specific variables that I want to store with
>>> the
>>> > >>> object, namely, experimental information (e.g., batch, date,
>>> > experimental
>>> > >>> condition, ...) and sample quality (e.g., proportion of aligned
>>> reads,
>>> > >>> total duplicate reads, etc...).
>>> > >>>
>>> > >>> Of course, I can always create one big data frame concatenating
>>> the two
>>> > >>> (experimental info + sample quality), but it seems that both
>>> > conceptually
>>> > >>> and practically, it might be useful to have two separate data
>>> frames.
>>> > >>> Since this seems somewhat a reasonably standard type of information
>>> > that
>>> > >>> one would want to carry on, I was wondering if it would be
>>> possible /
>>> > >>> useful to allow the user to have multiple data.frames in the
>>> colData
>>> > slot
>>> > >>>
>>> > >>
>>> > >> Actually, colData() is a DataFrame, and a DataFrame column can
>>> contain a
>>> > >> DataFrame. So after
>>> > >>
>>> > >>   example(SummarizedExperiment)
>>> > >>
>>> > >> we could make some faux sample quality data
>>> > >>
>>> > >>   quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))
>>> > >>
>>> > >> add this as a column in the colData()
>>> > >>
>>> > >>   colData(se1)$quality = quality
>>> > >>
>>> > >> (or create the SummarizedExperiment from a similar DataFrame
>>> up-front)
>>> > >> and manage our grouped data
>>> > >>
>>> > >> > colData(se1)
>>> > >> DataFrame with 6 rows and 2 columns
>>> > >>     Treatment     quality
>>> > >>   <character> <DataFrame>
>>> > >> A        ChIP    ########
>>> > >> B       Input    ########
>>> > >> C        ChIP    ########
>>> > >> D       Input    ########
>>> > >> E        ChIP    ########
>>> > >> F       Input    ########
>>> > >> > colData(se1[,1:2])$quality
>>> > >> DataFrame with 2 rows and 2 columns
>>> > >>           x         y
>>> > >>   <integer> <integer>
>>> > >> A         1         6
>>> > >> B         2         5
>>> > >>
>>> > >> I'm not sure that this is any less confusing to the end user than
>>> having
>>> > >> to manage a DataFrameList(), but it does not require any new
>>> features.
>>> > >>
>>> > >> Martin
>>> > >>
>>> > >>  of SummarizedExperiment.
>>> > >>>
>>> > >>> Best,
>>> > >>> Davide
>>> > >>>
>>> > >>>     [[alternative HTML version deleted]]
>>> > >>>
>>> > >>> _______________________________________________
>>> > >>> Bioc-devel at r-project.org mailing list
>>> > >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> > >>>
>>> > >>>
>>> > >>
>>> > >>
>>> > > _______________________________________________
>>> > > Bioc-devel at r-project.org mailing list
>>> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> > >
>>> >
>>> >         [[alternative HTML version deleted]]
>>> >
>>> > _______________________________________________
>>> > Bioc-devel at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list