[BioC] Question about quantile normalization and NA value

Steve Lianoglou lianoglou.steve at gene.com
Wed Jan 22 18:34:57 CET 2014


On Wed, Jan 22, 2014 at 3:43 AM, godahajime <godahajime at zoho.com> wrote:
> Dr Steve Lianoglou,
> Thanks for your reply.
> The sample size is too small as you mentioned.
> That matter may be left out of consideration because the actuall sample size is over 2000x300.
> I read the tutorial of limma and the source code of "normalizeBetweenArrays", however, I  couldn't understand how NA values were processed.
> Could you show me the prodess?

They are handled "very carefully" ;-)

The function that actually does the quantile normalization is

If you *really* want to understand what is happening there, I suggest you:

(1) download the source code for limma
(2) open the limma/R/norm.R file and jump to the `normalizeQuantiles` function.
(3) reconstruct the parameters required to run the function, ie:

  (a) Create a test matrix with some (5) data points missing:
  R> A <- matrix(rnorm(50), nrow=10)
  R> A[sample(50, 5)] <- NA

  (b) Create a `ties` variable:
  R> ties <- TRUE

(4) Now step through the code

As you step through the code, take a careful look at what each line
produces -- you will likely get tripped up by some of the code there,
but read the documentation (I'm sure you will have to read ?approx,
for instance)

If you really care to know how NA's are accounted for, that's how you
would go about doing it. Others are happy enough to know that they are
more or less ignored and accounted for, and that's that.

It is a good exercise to do for yourself, either way, as performing
these exercises for several different "well travelled" packages is a
great way to learn how to code in R, as well as tricks-of-the-trade
related to programming/computing w/ data in general.



Steve Lianoglou
Computational Biologist

More information about the Bioconductor mailing list