[BioC] loged data or not loged previous to use normalize.quantile

Gordon Smyth smyth at wehi.edu.au
Mon Apr 4 09:40:14 CEST 2005


Dear Marcelo,

There are at least three different issues merged together in your question, 
which is one reason why your post has prompted so many replies. The issues are

1. Producing NAs during background correction
2. Quantile normalization on log or raw scale
3. Differential expression analysis (linear modelling) on log or raw scale

Let's consider these in turn:

1. You haven't said anything about background correction. If you are 
planning to use quantile normalization, it is absolutely essential that you 
avoid creating negative or zero intensities during the background 
correction process. (Unfortunately I don't think that this point is made 
explicitly anywhere in the limma documentation, although it has been said 
several times on the Bioconductor mailing list.) See the function 
backgroundCorrect() for some options.

2. There exist no clear results on whether it is best to carry out quantile 
normalization on the raw or log scale. The function 
normalizeBetweenArrays() in the limma package is set up to quantile 
normalize on the log-scale. However the very successful RMA algorithm for 
Affymetrix data normalizes quantiles on the raw scale. I am slowly coming 
around to the idea that quantile normalization might be slightly better on 
the raw scale. So raw or log scale is optional. Note however, if you 
normalize on the log-scale, you absolutely must avoid NAs corresponding to 
negative intensities -- see point 1. Using quantile normalization on data 
which contains NAs arising from negative intensities is wrong.

3. However you background correct, and however you normalize, there is 
over-whelming evidence that linear modelling analysis, such as that done by 
the package, is better done on the log-scale. This is because the variances 
are more nearly stabilized on the log scale than on the raw scale. This is 
separate from point 2.

Gordon

----------- original message ------------
Marcelo Luiz de Laia mlaia at fcav.unesp.br
Fri Apr 1 20:20:17 CEST 2005
>Dear Bioconductors Friends,
>
>I have a question that I dont found answer for it. Please, if you have a
>paper/article that explain it, please, tell me.
>
>I normalize our data using normalize.quantile function.
>
>If I previous transform our intensities (single channel) in log2, I dont
>get differentially genes in limma.
>
>But, if I dont transform our data, I get some genes with p.value around
>0.0001, thats is great!
>
>Of course, when I transform the intensities data to log2, I get some NA.
>
>Why are there this difference? Am I wrong in does an analysis with not
>loged data?
>
>Thanks a lot
>
>Marcelo



More information about the Bioconductor mailing list