[Bioc-devel] appropriate S4 class to contain microarray OR clinical feature data for clustering

Vincent Carey stvjc at channing.harvard.edu
Mon Mar 30 20:50:19 CEST 2015


On Mon, Mar 30, 2015 at 2:10 PM, Katie Planey <katie.planey at gmail.com>
wrote:

> Hi all,
>
> I'm packaging my code for a meta-clustering method to submit to
> Bioconductor, and I have the following design dilemma:  my method can be
> used on either a matrix of gene expression data (well really any biological
> data), or a matrix of clinical data. I would like to have the input data
> matrices (one for each experiment) to be in standardized S4 objects, as
> Bioconductor encourages.
>
> But the eSet class is specifically designed for biological data. Is there a
>

I would say that eSet was designed for the combination of high-dimensional
assays
(with G features per sample) on modest numbers of samples (say N samples),
with
arbitrary sample-level (could say "clinical") data with, say, R attributes
on each of the
N samples.  The task was to coordinate the relationship between the G assay
results
and R attributes of each sample.


> broader S4 class I could use, or would I just instruct users with clinical
> data to put their data matrix in the "featureData" slot? This clinical data
>

featureData is designed to handle additional attributes on each of the G
assay elements.
For example, different nomenclatures or locations for genes or probe sets.



> matrix would not contain outcomes variables; that could still potentially
> be
> stored in the phenoData slot.  The AnnotatedDataFrame appears to  be too
> simplistic, given that a user may still want to store several outcomes
> variables in a phenoData or phenoData-like slot (varMetaData does from
> AnnotatedDataFrame does not appear to fit this purpose).
>

The raison d'être of eSet is combination of information of distinct types.
In the microarray case,
the assays were regimented by array manufacturing and it was reasonable to
sequester the information on the array outputs from unregimented
information that
might be collected on samples.  This led to the X[G,S] idiom, so that we
could think
of filtering features and samples using matrix-like notation, managing both
the
probe and sample level data in a coordinated way.

In your case, the main data structure seems to be a matrix.  You likely
don't want to
use a "naked" matrix, but would like to allow attachment of metadata that
makes the
information more self-describing, and permits methods to operate on the
data minimizing the
need for guidance from the user.  You could define your own class to add
information
to the matrix that is central to the methods you are providing.


> Best,
> Katie
> --
> Katie Planey
> https://www.linkedin.com/in/katieplaney
> PhD Candidate | Stanford Biomedical Informatics
>
>
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list