[BioC] ExpressionSet or MAList
Gordon K Smyth
smyth at wehi.EDU.AU
Tue May 6 10:08:31 CEST 2008
Although somewhat tangential to the discussion, because ExpressionSet and
MAList both can store annotation, I thought it might be interesting to
explain why I view the ability to store more than one column of probe
annotation in a microarray data object as essential. There are many
reasons, including
1. If the data object is subsetted frequently, the annotation should
subset appropriately.
2. I want to be able to come back to an analysis years afterwards and be
able to repeat it exactly, including the annotation, not be completely
dependent on a constantly changing annotation package. This is part of
reproducible research as I see it. Of course I also want to be able to
update the annotation, but in a controlled way.
3. Applications requiring annotation such as the limma controlStatus()
function.
4. I am frequently presented with academic arrays for which no single
annotation column of unique probe identifiers is provided. Instead
several columns may be needed to identify the probe. People who haven't
had this experience are fortunate.
In general, the need to work with annotation as an associated data.frame
is greater with "messier" microarray platforms such as academic two-colour
cDNA arrays and with once-off custom platforms.
Gordon
> Date: Wed, 30 Apr 2008 10:10:21 -0700
> From: Martin Morgan <mtmorgan at fhcrc.org>
> Subject: Re: [BioC] ExpressionSet or MAList
> To: Daniel Brewer <daniel.brewer at icr.ac.uk>
> Cc: bioconductor at stat.math.ethz.ch
>
> Daniel Brewer <daniel.brewer at icr.ac.uk> writes:
>
>> To my mind MAList stores the annotation with the dataset which I feel is
>
> Storing annotations with the object can be a bad thing if the
> annotations are the same, because then there are effectively different
> variants of the same annotation, one for each object. These will
> inevitably drift apart, leading to confusion. There is also a memory
> use issue.
>
> That said, annotations can be added to ExpressionSet, specifically
> using featureData to store an AnnotatedDataFrame (data.frame +
> annotation on column labels).
>
>> an advantage whereas ExpressionSet is the base implementation for many
>> libraries.
More information about the Bioconductor
mailing list