[Bioc-devel] Multiple colData in SummarizedExperiment

Thu Jun 18 04:23:18 CEST 2015

On 06/17/2015 11:41 AM, davide risso wrote:
> Dear list,
>
> I'm creating an R package to store RNA-seq data of a somewhat large project
> in which I'm involved.
>
> One of the initial goals is to compare different pre-processing pipelines,
> hence I have multiple expression matrices corresponding to the same samples.
> The SummarizedExperiment class seems a good candidate, since I have
> multiple expression matrices with the same rowData and colData information.
>
> I have several sample-specific variables that I want to store with the
> object, namely, experimental information (e.g., batch, date, experimental
> condition, ...) and sample quality (e.g., proportion of aligned reads,
> total duplicate reads, etc...).
>
> Of course, I can always create one big data frame concatenating the two
> (experimental info + sample quality), but it seems that both conceptually
> and practically, it might be useful to have two separate data frames.
> Since this seems somewhat a reasonably standard type of information that
> one would want to carry on, I was wondering if it would be possible /
> useful to allow the user to have multiple data.frames in the colData slot

Actually, colData() is a DataFrame, and a DataFrame column can contain a 
DataFrame. So after

   example(SummarizedExperiment)

we could make some faux sample quality data

   quality = DataFrame(x=1:6, y=6:1, row.names=colnames(se1))

add this as a column in the colData()

   colData(se1)$quality = quality

(or create the SummarizedExperiment from a similar DataFrame up-front) and 
manage our grouped data

 > colData(se1)
DataFrame with 6 rows and 2 columns
     Treatment     quality
   <character> <DataFrame>
A        ChIP    ########
B       Input    ########
C        ChIP    ########
D       Input    ########
E        ChIP    ########
F       Input    ########
 > colData(se1[,1:2])$quality
DataFrame with 2 rows and 2 columns
           x         y
   <integer> <integer>
A         1         6
B         2         5

I'm not sure that this is any less confusing to the end user than having to 
manage a DataFrameList(), but it does not require any new features.

Martin

> of SummarizedExperiment.
>
> Best,
> Davide
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793