[BioC] Quantile normalization vs. data distributions

Ben Bolstad bolstad at stat.berkeley.edu
Mon Mar 15 20:12:54 MET 2004


At least to me this is a question of what assumptions do I need to make
to carry out a normalization (not necessarily restricted to quantile
normalization).

In particular: Can I expect at least one of the following to be true for
my data set?
a) Only a few genes (relative to the total number on the array) are
changing
b) About the same number of genes are increasing in expression as are
decreasing in expression between any two treatments.

If this is not the case then you may have problems with any
normalization.

Naomi has suggested a reasonable approach, if you want to take a more
data exploratory approach. Provided there is not some sort of
confounding variable, big differences between treatment groups in this
sort of plot might indicate that you do not want to normalize across all
chips. Perhaps in that case you might consider normalizing within
treatment group. My guess would be that usually you'd find within group
differences (in terms of densities) larger than between groups.

Thanks,

Ben



On Mon, 2004-03-15 at 07:04, Naomi Altman wrote:
> This is a very good question that I have also been puzzling over.  It seems 
> useless to try
> tests of equality of the distribution such as Kolmogorov-Smirnov- due to 
> the huge sample size you
> would almost certainly get a significant result.
> 
> Currently, I am using the following graphical method:
> 
> 1. I compute a kernel density estimate of the combined data of all probes 
> on all the arrays.
> 2. I compute a kernel density estimate of the data for each array.
> 3. I plot both smooths on the same plot, and decide if they are the same.
> 
> Looking at what I wrote above, I think it would be better in steps 1 and 2 
> to background correct and
> center each array before combining.  It might also be between to reduce the 
> data to standardized scores before combining, unless
> you think that the overall scaling is due to your "treatment effect".
> 
> It seems like half of what I do is ad hoc, so I always welcome any 
> criticisms or suggestions.
> 
> --Naomi Altman
> 
> At 06:07 PM 3/11/2004, Stan Smiley wrote:
> >Greetings,
> >
> >I have been trying to find a quantitative measure to tell when the data
> >distributions
> >between chips are 'seriously' different enough from each other to violate
> >the
> >assumptions behind quantile normalization. I've been through the archives
> >and seen some discussion of this matter, but didn't come away with a
> >quantitative measure I
> >could apply to my data sets to assure me that it would be OK to use quantile
> >normalization.
> >
> >
> >"Quantile normalization uses a single standard for all chips, however it
> >assumes that no serious change in distribution occurs"
> >
> >Could someone please point me in the right direction on this?
> >
> >Thanks.
> >
> >Stan Smiley
> >stan.smiley at genetics.utah.edu
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Bioinformatics Consulting Center
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list