[BioC] Question about quantile normalization and NA value
Ben Bolstad
bmb at bmbolstad.com
Thu Jan 23 04:52:34 CET 2014
At least for the example matrix below, you’ll find the preprocessCore normalize.quantiles() function will generate you the same result as below from limma. Though I make no claims that it is identical in other cases, nor that its treatment of NA is better than any other implementations.
Best,
Ben
On Jan 22, 2014, at 7:16 PM, Gordon K Smyth <smyth at wehi.EDU.AU> wrote:
> The meaning of quantile normalization with NAs have never been agreed on in a refereed publication, as far as I know. I implemented the limma version long ago, and as far as I know it was the first implementation of quantile normalization to allow NAs. Ben Bolstad implemented a somewhat different algorithm in the affy package. Ben's version is now in the preprocessCore package as normalize.quantiles().
>
> The result you have is correct according to limma's algorithm, which involves interpolating each column of non-missing values out a full length vector when computing the mean quantiles. The reason the NA makes a big difference is that it changes the minimum quantile for column 2 from 16.5 to 110, a big change. As an alternative, you might try Ben's algorithm:
>
> library(proprocessCore)
> normalize.quantiles(y)
>
> But replacing NAs with row medians would not in general be sufficient.
>
> Best wishes
> Gordon
>
>> Date: Tue, 21 Jan 2014 05:03:17 -0800 (PST)
>> From: H at mamba.fhcrc.org, "K [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, godahajime at zoho.com
>> Subject: [BioC] Question about quantile normalization and NA value
>>
>>
>> Dear all,
>>
>> I have a quation about quantile normalization and NA value.
>>
>> I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package.
>> I normalized a data with NA as follows:
>>
>>> x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3)
>>> colnames(x) <- paste("Chip",1:3, sep="")
>>> rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D")
>>>
>>> x
>> Chip1 Chip2 Chip3
>> RNA-A 100 110.0 120
>> RNA-B 15 16.5 18
>> RNA-C 200 220.0 240
>> RNA-D 250 275.0 300
>>>
>>> normalizeBetweenArrays(x)
>> Chip1 Chip2 Chip3
>> RNA-A 110.0 110.0 110.0
>> RNA-B 16.5 16.5 16.5
>> RNA-C 220.0 220.0 220.0
>> RNA-D 275.0 275.0 275.0
>>>
>>> y <- x
>>> y[2,2] <- NA
>>>
>>> normalizeBetweenArrays(y)
>> Chip1 Chip2 Chip3
>> RNA-A 134.44444 47.66667 134.44444
>> RNA-B 47.66667 NA 47.66667
>> RNA-C 226.11111 180.27778 226.11111
>> RNA-D 275.00000 275.00000 275.00000
>>
>>
>> I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ?
>> Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ?
>> My environment is limma Version 3.16.6, R version 3.0.1.
>>
>> Thanks
>
> ______________________________________________________________________
> The information in this email is confidential and intend...{{dropped:4}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list