[BioC] loged data or not loged previous to use
normalize.quantile
Gordon Smyth
smyth at wehi.edu.au
Mon Apr 4 09:40:14 CEST 2005
Dear Marcelo,
There are at least three different issues merged together in your question,
which is one reason why your post has prompted so many replies. The issues are
1. Producing NAs during background correction
2. Quantile normalization on log or raw scale
3. Differential expression analysis (linear modelling) on log or raw scale
Let's consider these in turn:
1. You haven't said anything about background correction. If you are
planning to use quantile normalization, it is absolutely essential that you
avoid creating negative or zero intensities during the background
correction process. (Unfortunately I don't think that this point is made
explicitly anywhere in the limma documentation, although it has been said
several times on the Bioconductor mailing list.) See the function
backgroundCorrect() for some options.
2. There exist no clear results on whether it is best to carry out quantile
normalization on the raw or log scale. The function
normalizeBetweenArrays() in the limma package is set up to quantile
normalize on the log-scale. However the very successful RMA algorithm for
Affymetrix data normalizes quantiles on the raw scale. I am slowly coming
around to the idea that quantile normalization might be slightly better on
the raw scale. So raw or log scale is optional. Note however, if you
normalize on the log-scale, you absolutely must avoid NAs corresponding to
negative intensities -- see point 1. Using quantile normalization on data
which contains NAs arising from negative intensities is wrong.
3. However you background correct, and however you normalize, there is
over-whelming evidence that linear modelling analysis, such as that done by
the package, is better done on the log-scale. This is because the variances
are more nearly stabilized on the log scale than on the raw scale. This is
separate from point 2.
Gordon
----------- original message ------------
Marcelo Luiz de Laia mlaia at fcav.unesp.br
Fri Apr 1 20:20:17 CEST 2005
>Dear Bioconductors Friends,
>
>I have a question that I dont found answer for it. Please, if you have a
>paper/article that explain it, please, tell me.
>
>I normalize our data using normalize.quantile function.
>
>If I previous transform our intensities (single channel) in log2, I dont
>get differentially genes in limma.
>
>But, if I dont transform our data, I get some genes with p.value around
>0.0001, thats is great!
>
>Of course, when I transform the intensities data to log2, I get some NA.
>
>Why are there this difference? Am I wrong in does an analysis with not
>loged data?
>
>Thanks a lot
>
>Marcelo
More information about the Bioconductor
mailing list