[BioC] Opinions on array design, normalization, and linear modeling with LIMMA
J.delasHeras at ed.ac.uk
J.delasHeras at ed.ac.uk
Fri Nov 2 12:27:17 CET 2007
Quoting Kasper Daniel Hansen <khansen at stat.berkeley.edu>:
> I agree here, the scale on the y-axis is quite dramatic. Note that we
> are not necessarily saying that too many genes are DE, but that some
> of them have dramatic fold changes.
It really depends on the biology of teh experiment, and as during
embryogenesis you have quite dramatic changes, I don't think the range
of the M values is something to worry about... at least not without
checking the biology first. The original poster seemed to expect a lot
of variation between the time points compared.
I have seem similar MA plots, when comparing for instance two cell
lines that are supposedly derived from the same tissue... (a totally
different problem, I know...)
> Most of the normalization techniques are derived under the assumption
> that not too many genes are DE. Facing your problem of many DE genes,
> some people would say "clearly the assumptions are not correct". I
> would say that you should use the methods that gives you the best
> inference. Sometimes people have observed that applying the
> "standard" normalization techniques actually improve their calls,
> even on datasets with many DE genes.
I don't think that's entirely correct. I don't think that the
assumption is that not too many genes are not DE, but that *most*
genes are not DE, or they're evenly spread between up/downregulation
across the range of raw intensities measured. It's a fine distinction.
Imagine an MA plot (raw data) where everything lies around the M=0
line, very tightly, with just a few genes straying up to higher |M|
values. Then imagine anotehr MA plot where you have the same
situation, plus another few thousand spots, evenly distributed up or
down, with as extreme values as you like...
Normalisation methods like loess simply try to determine what is "not
changed": fit a regression curve and it will neatly follow along the
M=0 line... It will do so in both cases indicated above. The question
really is not simply that there are not many genes DE... if the % of
DE genes is low, of course that makes things easier, as their
contribution to the regression curve using all of the spots will be
small. But you can have many DE genes and still be able to use loess
perfectly happily.
You really have to observe the data, and have an idea of the biology
of teh experiment to know what you are expecting (if the bulk of teh
data is really not DE).
This is why it's so hard to recommend any way to normalise data just
by looking at a plot... I'd say that in most experiments, a loess
regression curve is good enough as a normalisation aid, and that's why
people often use it with good results even when all the assumptions
are not perfectly met, especially that of not having many DE genes.
the only sure way to normalise any set of data is to have a good set
of control spots whose behaviour is known a priori. But one can often
do without it and get reasonable results. Most of us do :)
> I think most of us need more time with the data in order to really
> give you any recommendations. You should seek out a local expert.
Good suggestion, and don't forget to explain the biology behind the
experiment (i.e: the behaviour you expect, if known)
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the Bioconductor
mailing list