[BioC] Breaking the "most genes not differentially expressed" assumption

Mon Apr 27 17:25:22 CEST 2009

Hi all,

I have dataset of 120 Affy arrays, 60 males and 60 females.
The expression profiles of the 2 groups differs dramatically, i.e. if I 
run a standard RMA + limma, I have ~90% of the genes differentially 
expressed. Also, downregulated genes are twice as many than upregulated 
genes, although if I impose a cutoff of two-fold difference in 
expression, they are almost equal (15% up and 15% down).
This is clearly breaking the assumption that most of the genes on the 
array should not be differentially expressed, but the result is in line 
with the current knowledge of sex-biased gene expression in my model 
organism.

I have done some quality control plots, available here:
- Boxplot:
http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_boxplot.pdf

- Frequency histogram:
http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_histogram.pdf

- RLE and NUSE plots:
http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_RLEandNUSE1.pdf

- CorrelationPlot:
http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_correlationplot.pdf

- PCA, after RMA normalization:
http://www.iee.uu.se/zooekol/pdf/hemiarray_qc_pca.pdf

Now, my questions are:
1) Is my issue really a issue? If so, how can I perform a robust 
normalization of my arrays?

2) Is there a tool to assess how "robust" your pre-processing method is 
in respect to this issue?

3) Sex-biased gene expression is not the only biological question in my 
experiment. Is the massive size of this effect going to affect the 
"detectability" of other smaller effects? (through normalization or 
correction for multiple testing or other?)

Thanks,
paolo

-- 
Paolo Innocenti
Department of Animal Ecology, EBC
Uppsala University
Norbyvägen 18D
75236 Uppsala, Sweden