[BioC] Quantile normalization vs. data distributions

Tue Mar 16 22:05:36 MET 2004

The problem with p-values is that they measure the "surprise factor" not 
the size of the effect.  Suppose that you are testing a cholesterol busting 
drug, and it really has the effect of lowering mean choldesterol (over your 
population) by .001.  Does anyone care? (Cholesterol values generally range 
from about 100-400.)  But if your sample size is big enough, you have power 
to detect infinitismally small differences.

For the purpose of normalization, we probably want the probe distributions 
to be similar.  If they are already identical, we do not need to 
normalize.  So, with a sufficiently large sample, all we will learn is that 
the probe distributions are not identical - but not how far apart they are.

--Naomi

At 10:58 AM 3/16/2004, Arne.Muller at aventis.com wrote:
>Hello,
>
>I've two questions regarding the suggestions from Naomi.
>
>1. I've had a look at some density plots (*after* rma bgcorret + quantile
>normalisation across all chips of my experiment). The tails of the plots look
>very similar wheras the at high density some plots differ in shape or value.
>When/how would you consider the two distributions to be equal?
>
>2. As a non-statistician I'm a bit confused that statistical test will nearly
>always find a significant difference between distributions when the samples
>are large (I remember someone mentioned this to me - without explanations -
>about 2 years ago in a posting to the R-list). Is there a way to "normalize"
>the test results (e.g. the p-values) by the size of the sample?
>
>I guess such a significant difference as reported by a test is a *real*
>difference (otherwise all statistical test would be worthless ...). Can one
>assume, that even if the two distributions are statistically different, one
>can treat them as equal judged by visuall investigatigation of a density plot
>or histogram?
>
>What is a large sample? If a test finds a difference between two
>distributions, how do I know it's not just because of the sample size? Is
>there something like a "maximum sample size test" (similar to determining the
>power of a test)?
>
>Thanks again for your comments,
>
>         +kind regarrds,
>
>         Arne
>
>--
>Arne Muller, Ph.D.
>Toxicogenomics, Aventis Pharma
>arne dot muller domain=aventis com
>
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch
> > [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of
> > Naomi Altman
> > Sent: 15 March 2004 16:05
> > To: Stan Smiley; Bioconductor Mailing list
> > Subject: Re: [BioC] Quantile normalization vs. data distributions
> >
> >
> > This is a very good question that I have also been puzzling
> > over.  It seems
> > useless to try
> > tests of equality of the distribution such as
> > Kolmogorov-Smirnov- due to
> > the huge sample size you
> > would almost certainly get a significant result.
> >
> > Currently, I am using the following graphical method:
> >
> > 1. I compute a kernel density estimate of the combined data
> > of all probes
> > on all the arrays.
> > 2. I compute a kernel density estimate of the data for each array.
> > 3. I plot both smooths on the same plot, and decide if they
> > are the same.
> >
> > Looking at what I wrote above, I think it would be better in
> > steps 1 and 2
> > to background correct and
> > center each array before combining.  It might also be between
> > to reduce the
> > data to standardized scores before combining, unless
> > you think that the overall scaling is due to your "treatment effect".
> >
> > It seems like half of what I do is ad hoc, so I always welcome any
> > criticisms or suggestions.
> >
> > --Naomi Altman
> >
> > At 06:07 PM 3/11/2004, Stan Smiley wrote:
> > >Greetings,
> > >
> > >I have been trying to find a quantitative measure to tell
> > when the data
> > >distributions
> > >between chips are 'seriously' different enough from each
> > other to violate
> > >the
> > >assumptions behind quantile normalization. I've been through
> > the archives
> > >and seen some discussion of this matter, but didn't come away with a
> > >quantitative measure I
> > >could apply to my data sets to assure me that it would be OK
> > to use quantile
> > >normalization.
> > >
> > >
> > >"Quantile normalization uses a single standard for all
> > chips, however it
> > >assumes that no serious change in distribution occurs"
> > >
> > >Could someone please point me in the right direction on this?
> > >
> > >Thanks.
> > >
> > >Stan Smiley
> > >stan.smiley at genetics.utah.edu
> > >
> > >_______________________________________________
> > >Bioconductor mailing list
> > >Bioconductor at stat.math.ethz.ch
> > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >
> > Naomi S. Altman                                814-865-3791 (voice)
> > Associate Professor
> > Bioinformatics Consulting Center
> > Dept. of Statistics                              814-863-7114 (fax)
> > Penn State University                         814-865-1348
> > (Statistics)
> > University Park, PA 16802-2111
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111