[BioC] Quantile normalization vs. data distributions
paul.boutros at utoronto.ca
paul.boutros at utoronto.ca
Mon Mar 15 20:27:24 MET 2004
We've been testing something similar. We:
a) center each array around 0 and scale to 1 SD
b) compute kernel-densities for each array
c) perform all pairwise comparisons between arrays, using area under both
curves as a similarity metric
d) Manually verify the most extreme outliers (e.g. the pairs of arrays with the
smallest common area)
This seems to work okay for us. As you say, any direct distributional test
with large arrays always finds significant differences in our hands.
Paul
Date: Mon, 15 Mar 2004 10:04:57 -0500
From: Naomi Altman <naomi at stat.psu.edu>
Subject: Re: [BioC] Quantile normalization vs. data distributions
To: "Stan Smiley" <swsmiley at genetics.utah.edu>, "Bioconductor Mailing
list" <bioconductor at stat.math.ethz.ch>
Message-ID: <6.0.0.22.2.20040314225049.01d7ffb8 at stat.psu.edu>
Content-Type: text/plain; charset="us-ascii"; format=flowed
This is a very good question that I have also been puzzling over. It seems
useless to try
tests of equality of the distribution such as Kolmogorov-Smirnov- due to
the huge sample size you
would almost certainly get a significant result.
Currently, I am using the following graphical method:
1. I compute a kernel density estimate of the combined data of all probes
on all the arrays.
2. I compute a kernel density estimate of the data for each array.
3. I plot both smooths on the same plot, and decide if they are the same.
Looking at what I wrote above, I think it would be better in steps 1 and 2
to background correct and
center each array before combining. It might also be between to reduce the
data to standardized scores before combining, unless
you think that the overall scaling is due to your "treatment effect".
It seems like half of what I do is ad hoc, so I always welcome any
criticisms or suggestions.
--Naomi Altman
At 06:07 PM 3/11/2004, Stan Smiley wrote:
>Greetings,
>
>I have been trying to find a quantitative measure to tell when the data
>distributions
>between chips are 'seriously' different enough from each other to violate
>the
>assumptions behind quantile normalization. I've been through the archives
>and seen some discussion of this matter, but didn't come away with a
>quantitative measure I
>could apply to my data sets to assure me that it would be OK to use quantile
>normalization.
>
>
>"Quantile normalization uses a single standard for all chips, however it
>assumes that no serious change in distribution occurs"
>
>Could someone please point me in the right direction on this?
>
>Thanks.
>
>Stan Smiley
>stan.smiley at genetics.utah.edu
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor
mailing list