[Bioc-devel] Multiple colData in SummarizedExperiment

Thu Jun 18 14:59:42 CEST 2015

I think the more clean solution for Davide (if he inists on having separate
objects; I decided against it in minfi) is to extend the class to allow
this.

Kasper

On Thu, Jun 18, 2015 at 12:25 AM, Ryan <rct at thompsonclan.org> wrote:

> Oh wow, I didn't know you could put a DataFrame into a single column of
> another DataFrame. That actually solves a problem for me too (I don't
> intend to expose nested DataFrames to the users though).
>
>
> On 6/17/15 7:23 PM, Martin Morgan wrote:
>
>> On 06/17/2015 11:41 AM, davide risso wrote:
>>
>>> Dear list,
>>>
>>> I'm creating an R package to store RNA-seq data of a somewhat large
>>> project
>>> in which I'm involved.
>>>
>>> One of the initial goals is to compare different pre-processing
>>> pipelines,
>>> hence I have multiple expression matrices corresponding to the same
>>> samples.
>>> The SummarizedExperiment class seems a good candidate, since I have
>>> multiple expression matrices with the same rowData and colData
>>> information.
>>>
>>> I have several sample-specific variables that I want to store with the
>>> object, namely, experimental information (e.g., batch, date, experimental
>>> condition, ...) and sample quality (e.g., proportion of aligned reads,
>>> total duplicate reads, etc...).
>>>
>>> Of course, I can always create one big data frame concatenating the two
>>> (experimental info + sample quality), but it seems that both conceptually
>>> and practically, it might be useful to have two separate data frames.
>>> Since this seems somewhat a reasonably standard type of information that
>>> one would want to carry on, I was wondering if it would be possible /
>>> useful to allow the user to have multiple data.frames in the colData slot
>>>
>>
>> Actually, colData() is a DataFrame, and a DataFrame column can contain a
>> DataFrame. So after
>>
>>   example(SummarizedExperiment)
>>
>> we could make some faux sample quality data
>>
>>   quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))
>>
>> add this as a column in the colData()
>>
>>   colData(se1)$quality = quality
>>
>> (or create the SummarizedExperiment from a similar DataFrame up-front)
>> and manage our grouped data
>>
>> > colData(se1)
>> DataFrame with 6 rows and 2 columns
>>     Treatment     quality
>>   <character> <DataFrame>
>> A        ChIP    ########
>> B       Input    ########
>> C        ChIP    ########
>> D       Input    ########
>> E        ChIP    ########
>> F       Input    ########
>> > colData(se1[,1:2])$quality
>> DataFrame with 2 rows and 2 columns
>>           x         y
>>   <integer> <integer>
>> A         1         6
>> B         2         5
>>
>> I'm not sure that this is any less confusing to the end user than having
>> to manage a DataFrameList(), but it does not require any new features.
>>
>> Martin
>>
>>  of SummarizedExperiment.
>>>
>>> Best,
>>> Davide
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]