[BioC] RE : Limma: Normalization with large numbers ofdifferentially expressed genes
Serge Eifes
serge.eifes at lbmcc.lu
Wed Oct 10 14:19:59 CEST 2007
Dear Jose,
Thanks a lot for your answer!
For me it seems as the parameters used for the loess normalization work out
fine in this situation. Here you may find examples of the MA plots before
and after normalization:
http://www.lbmcc.lu/microarray_pics/MA_plots_13.png
http://www.lbmcc.lu/microarray_pics/MA_plots_15.png
The two slides shown here are for the timepoint were the most significantly
regulated genes have been detected.
What we intend to do now is to perform Real-Time PCR validation on a larger
scale testing 40 up to 50 genes over the whole intensity scale. This might
perhaps help us to see if there were larger problems during loess
normalization and then integrate this knowledge into normalization.
The biology behind the molecule we use in this experiment in relation with
the used cell line is not so well described in the literature. But I found
that for other biological systems used in conjunction with this drug there
exists a good agreement for the positively and negatively regulated genes
with our results. The spike-in fold changes in our experiment showed also no
abnormal behavior compared to the expected values.
Best Regards,
Serge
Serge Eifes
Laboratoire de Biologie Moleculaire et Cellulaire du Cancer (LBMCC)
Hopital Kirchberg
9,rue Edward steichen
L-2540 LUXEMBOURG
Phone:+ 352 2468-4046 Fax : + 352 2468-4060
-----Message d'origine-----
De : J.delasHeras at ed.ac.uk [mailto:J.delasHeras at ed.ac.uk]
Envoyé : Wednesday, October 10, 2007 11:58 AM
À : Serge Eifes
Objet : Re: [BioC] Limma: Normalization with large numbers ofdifferentially
expressed genes
Quoting Serge Eifes <serge.eifes at lbmcc.lu>:
>
> Dear all,
>
> We have performed a time-series experiment (2h, 6h, 10h, 48h, 72h) on
> dual-channel arrays where we want to compare gene expression between
treated
> and time-matched untreated cells.
>
> This experiment was done using Agilent 4112F human whole genome
microarrays
> (with 45k features). Statistical analysis is performed using LIMMA 2.10.7
on
> R 2.5.1.
> Background correction was performed using normexp with an offset of 100.
> Loess normalization was done using a span of 0.4 and 12 iterations.
>
> Now I have encountered the following problems during data analysis:
>
> 1) The microarrays for the whole experiment were scanned at quite low
> intensities. This means that about 22k features on average per array have
an
> A-value located between 7 and 8.
>
> 2) It seems as there are also quite large numbers of differentially
> expressed probes when considering the raw per-probe p-values from the
> moderated t-test for the different time-points and the p-values for the
> moderated F-statistic after MHC (FDR, BH).
>
> Numbers of significant probes with raw per-probe p-value < 0.05 from
> moderated t-test as retrieved from the "MArrayLM" object are shown here:
> * t=0h: 1419
> * t=2h: 9428
> * t=6h: 15013
> * t=10h: 13641
> * t=48h: 21713
> * t=72h: 18027
>
> Here are shown the number of significant probes I get by using moderated
> F-statistic (nestedF) with p<0.05 after MHC:
> * t=0: 515
> * t=2h: 6278
> * t=6h: 11460
> * t=10h: 10560
> * t=48h: 17250
> * t=72h: 14311
>
> Now I've got the following questions:
>
> * Is the accumulation of signals at such low average intensities
problematic
> for the normalization process (beside that it may introduce a higher
> variability into the measurements)?
>
> * I already read in a reply by G.K. Smyth ([BioC] limma Normalization
> question) that loess normalization might get problematic when having
around
> 20% of differentially expressed genes. So in this case, does Loess
> normalization still work correctly, considering such large numbers of
> differentially expressed genes? If not, what kind of normalization may be
> more appropriate for this kind of data.
Hi Serge,
having a lot of spots with low intensity would only add noise but not
create much problem for normalisation. You used the normexp method for
background correction, which can be very good, when used with an
appropriate offset, to make the M values of low intensity spots
converge nicely towards zero, so i wouldn't worry excessively about
that.
regarding having a large % of differentially expressed genes... that's
more of a problem. The quote of 20% sounds like a conservative
estimate, but it does really depend on how those 20% of spots are
distributed... and you may get away with more... Loess is simply used
to fit a curve to teh population, and teh assumption is made that this
represents the non-changing baseline... where spots with no
differential expressions should align. This of course assumes that
most of teh data are evenly distributed on both sides of the curve,
more or less... and these assumptions are generally okay, and even
some deviations are tolerated. But you have to look at each experiment
and decide.
What do teh MA plots look like? Looking at MA plots you can see the
distribution of M values (before normalisation, so make an MA object
using normalisation between arrays, method="none"). You can compare
those plots with MA plots after normalisation, to see teh efect the
normalisation procedure has on the whole distribution.
You might find that loess will distort the distribution in ways that
do not seem reasonable, when there are too many differentially
expressed genes. How many is too many? It depends. It depends on the
number, but also on their distribution across intensities... MA plots
are the best to check this sort of thing.
I had an experiment that resulted in a large number of genes being
activated (going from low or no expression to a decent level). The MA
plot looked something like this (combining several slides, after lmfit):
http://mcnach.com/MISC/MAplots2.png
When using loess normalisation, my activated spots contributed
excessively to the total population, especially between the ranges
A=11 to A=12.5 or so... the resulting loess curve was clearly pushed
up in that area, and the resulting normalised data was distorted,
being pushed down.
For this sort of cases the best is to have a set of known invariant
spots, or control spots whose behaviour is expected, and use those to
normalise the whole thing. But often we don't have those.
In the case above, I was able to identify reasonably easily a large
number of those genes that were being activated, and I could flag them
so that they would not be included in the normalisation. By removing a
reasonable proportion of them I was able to eliminate the distortion
and the final plots look reasonable to me. I took a lot of time to
verify genes and make sure that everything was behaving alright, so I
was happy with this method. However, it requires that you are familiar
with the biology of teh experiment, and that you check and recheck
that what you're doing doesn't cause harm.
On the positive side... when I compared the results I got when using
loess directly on all spots (despite distortion) and with my more
carefully chosen ones... I found that whilst the latter was better in
general, I could still pick out pretty much the same genes either way.
Perhaps I was looking for a population that was already distinct
enough...
I'm not sure this is of any help to you right now... I guess the
bottom line is: make plots, before and after normalisation, have a
good idea of what you are expecting and see how far it is from what
you get. Loess is just fitting a curve to the distribution, according
to certain parameters... if you think you know what the curve should
look like (representing the non-changing bulk of teh data), you can
often find a work-around... as long as you know what is expected i
your experiment, to some degree. Without proper control spots, one has
to be careful, and understand the experiment.
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the Bioconductor
mailing list