[BioC] Quantile normalization vs. data distributions
Tue Mar 16 16:58:08 MET 2004
Hello,
I've two questions regarding the suggestions from Naomi.
1. I've had a look at some density plots (*after* rma bgcorret + quantile
normalisation across all chips of my experiment). The tails of the plots look
very similar wheras the at high density some plots differ in shape or value.
When/how would you consider the two distributions to be equal?
2. As a non-statistician I'm a bit confused that statistical test will nearly
always find a significant difference between distributions when the samples
are large (I remember someone mentioned this to me - without explanations -
about 2 years ago in a posting to the R-list). Is there a way to "normalize"
the test results (e.g. the p-values) by the size of the sample?
I guess such a significant difference as reported by a test is a *real*
difference (otherwise all statistical test would be worthless ...). Can one
assume, that even if the two distributions are statistically different, one
can treat them as equal judged by visuall investigatigation of a density plot
or histogram?
What is a large sample? If a test finds a difference between two
distributions, how do I know it's not just because of the sample size? Is
there something like a "maximum sample size test" (similar to determining the
power of a test)?
Thanks again for your comments,
+kind regarrds,
Arne
--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com
>
> This is a very good question that I have also been puzzling
> over. It seems
> useless to try
> tests of equality of the distribution such as
> Kolmogorov-Smirnov- due to
> the huge sample size you
> would almost certainly get a significant result.
> Currently, I am using the following graphical method:
>
> 1. I compute a kernel density estimate of the combined data
> of all probes
> on all the arrays.
> 2. I compute a kernel density estimate of the data for each array.
> 3. I plot both smooths on the same plot, and decide if they
> are the same.
>
> Looking at what I wrote above, I think it would be better in
> steps 1 and 2
> to background correct and
> center each array before combining. It might also be between
> to reduce the
> data to standardized scores before combining, unless
>
> It seems like half of what I do is ad hoc, so I always welcome any
> criticisms or suggestions.
>
> --Naomi Altman
>
> At 06:07 PM 3/11/2004, Stan Smiley wrote:
> >Greetings,
> >
> >I have been trying to find a quantitative measure to tell
> when the data
> >distributions
> >between chips are 'seriously' different enough from each
> other to violate
> >the
> >assumptions behind quantile normalization. I've been through
> the archives
> >and seen some discussion of this matter, but didn't come away with a
> >quantitative measure I
> >could apply to my data sets to assure me that it would be OK
> to use quantile
> >normalization.
> >
> >"Quantile normalization uses a single standard for all
> chips, however it
> >assumes that no serious change in distribution occurs"
> >
> >Could someone please point me in the right direction on this?
> >
> >Thanks.
> >
> >Stan Smiley
> >stan.smiley at genetics.utah.edu
> >
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348
> (Statistics)
> University Park, PA 16802-2111
