[BioC] Question about quantile normalization and NA value

Thu Jan 23 04:16:05 CET 2014

The meaning of quantile normalization with NAs have never been agreed on 
in a refereed publication, as far as I know. I implemented the limma 
version long ago, and as far as I know it was the first implementation of 
quantile normalization to allow NAs.  Ben Bolstad implemented a somewhat 
different algorithm in the affy package.  Ben's version is now in the 
preprocessCore package as normalize.quantiles().

The result you have is correct according to limma's algorithm, which 
involves interpolating each column of non-missing values out a full length 
vector when computing the mean quantiles.  The reason the NA makes a big 
difference is that it changes the minimum quantile for column 2 from 16.5 
to 110, a big change.  As an alternative, you might try Ben's algorithm:

    library(proprocessCore)
    normalize.quantiles(y)

But replacing NAs with row medians would not in general be sufficient.

Best wishes
Gordon

> Date: Tue, 21 Jan 2014 05:03:17 -0800 (PST)
> From: H at mamba.fhcrc.org, "K [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, godahajime at zoho.com
> Subject: [BioC] Question about quantile normalization and NA value
>
>
> Dear all,
>
> I have a quation about quantile normalization and NA value.
>
> I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package.
> I normalized a data with NA as follows:
>
>> x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3)
>> colnames(x) <- paste("Chip",1:3, sep="")
>> rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D")
>>
>> x
>      Chip1 Chip2 Chip3
> RNA-A   100 110.0   120
> RNA-B    15  16.5    18
> RNA-C   200 220.0   240
> RNA-D   250 275.0   300
>>
>> normalizeBetweenArrays(x)
>      Chip1 Chip2 Chip3
> RNA-A 110.0 110.0 110.0
> RNA-B  16.5  16.5  16.5
> RNA-C 220.0 220.0 220.0
> RNA-D 275.0 275.0 275.0
>>
>> y <- x
>> y[2,2] <- NA
>>
>> normalizeBetweenArrays(y)
>          Chip1     Chip2     Chip3
> RNA-A 134.44444  47.66667 134.44444
> RNA-B  47.66667        NA  47.66667
> RNA-C 226.11111 180.27778 226.11111
> RNA-D 275.00000 275.00000 275.00000
>
>
> I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ?
> Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ?
> My environment is limma Version 3.16.6, R version 3.0.1.
>
> Thanks

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}