[BioC] Limma: Normalization with large numbers of differentially expressed genes

Serge Eifes serge.eifes at lbmcc.lu
Wed Oct 10 11:00:14 CEST 2007


Dear all,

We have performed a time-series experiment (2h, 6h, 10h, 48h, 72h) on
dual-channel arrays where we want to compare gene expression between treated
and time-matched untreated cells.

This experiment was done using  Agilent 4112F human whole genome microarrays
(with 45k features). Statistical analysis is performed using LIMMA 2.10.7 on
R 2.5.1. 
Background correction was performed using normexp with an offset of 100.
Loess normalization was done using a span of 0.4 and 12 iterations.

Now I have encountered the following problems during data analysis: 

1) The microarrays for the whole experiment were scanned at quite low
intensities. This means that about 22k features on average per array have an
A-value located between 7 and 8. 

2) It seems as there are also quite large numbers of differentially
expressed probes when considering the raw per-probe p-values from the
moderated t-test for the different time-points and the p-values for the
moderated F-statistic after MHC (FDR, BH).

Numbers of significant probes with raw per-probe p-value < 0.05 from
moderated t-test as retrieved from the "MArrayLM" object are shown here:
* t=0h: 1419
* t=2h: 9428
* t=6h: 15013
* t=10h: 13641
* t=48h: 21713
* t=72h: 18027

Here are shown the number of significant probes I get by using moderated
F-statistic (nestedF) with p<0.05 after MHC:
* t=0: 515
* t=2h: 6278
* t=6h: 11460
* t=10h: 10560
* t=48h: 17250
* t=72h: 14311

Now I've got the following questions:

* Is the accumulation of signals at such low average intensities problematic
for the normalization process (beside that it may introduce a higher
variability into the measurements)? 

* I already read in a reply by G.K. Smyth ([BioC] limma Normalization
question) that loess normalization might get problematic when having around
20% of differentially expressed genes.  So in this case, does Loess
normalization still work correctly, considering such large numbers of
differentially expressed genes? If not, what kind of normalization may be
more appropriate for this kind of data.

Thanks in advance!

Best Regards,
Serge Eifes



Serge Eifes
Laboratoire de Biologie Moleculaire et Cellulaire du Cancer (LBMCC)
Hopital Kirchberg 
9,rue Edward steichen 
L-2540 LUXEMBOURG



More information about the Bioconductor mailing list