[BioC] Question - Merged Arrays and Averaging Method

Wed Jan 23 18:00:24 CET 2008

> I am a new user of the Bioconductor suite and have a question for the
> list. When performing an analysis, since I have duplicate arrays, I am
> merging my duplicate arrays before normalizing my data. I have an
option
> to use the mean or median values and also to log2 transform my data
before
> averaging. Would anybody with experience care to comment about the
> advantages or disadvantges regarding merging, averaging and log
> transforming your data during the analysis.
> 

Hi Marcos,

If you have duplicates of all arrays in your experiment it is worthwhile
to use this data. The limma package for instance has functionality to
use technical replication in the analysis. (see duplicateCorrelation()
function)

If you do not have duplicates for all samples there are several things
to consider.
- If you do average the arrays, then the samples that have a replicate
will show lower variability of overall gene expression. A varying amount
of variability between samples is not good for statistics
- Averaging should be performed on transformed (log or vsn) values. 
- for a duplicate the mean and median do not differ, only for 3 or more
replicates medians will be more robust than means

Personally when I do not have duplicates for all samples I discard one
of the duplicates, usually after checking which of the two comes out
better in QC.

Jan Oosting