[BioC] Question about quantile normalization and NA value
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Jan 23 04:16:05 CET 2014
The meaning of quantile normalization with NAs have never been agreed on
in a refereed publication, as far as I know. I implemented the limma
version long ago, and as far as I know it was the first implementation of
quantile normalization to allow NAs. Ben Bolstad implemented a somewhat
different algorithm in the affy package. Ben's version is now in the
preprocessCore package as normalize.quantiles().
The result you have is correct according to limma's algorithm, which
involves interpolating each column of non-missing values out a full length
vector when computing the mean quantiles. The reason the NA makes a big
difference is that it changes the minimum quantile for column 2 from 16.5
to 110, a big change. As an alternative, you might try Ben's algorithm:
library(proprocessCore)
normalize.quantiles(y)
But replacing NAs with row medians would not in general be sufficient.
Best wishes
Gordon
> Date: Tue, 21 Jan 2014 05:03:17 -0800 (PST)
> From: H at mamba.fhcrc.org, "K [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, godahajime at zoho.com
> Subject: [BioC] Question about quantile normalization and NA value
>
>
> Dear all,
>
> I have a quation about quantile normalization and NA value.
>
> I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package.
> I normalized a data with NA as follows:
>
>> x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3)
>> colnames(x) <- paste("Chip",1:3, sep="")
>> rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D")
>>
>> x
> Chip1 Chip2 Chip3
> RNA-A 100 110.0 120
> RNA-B 15 16.5 18
> RNA-C 200 220.0 240
> RNA-D 250 275.0 300
>>
>> normalizeBetweenArrays(x)
> Chip1 Chip2 Chip3
> RNA-A 110.0 110.0 110.0
> RNA-B 16.5 16.5 16.5
> RNA-C 220.0 220.0 220.0
> RNA-D 275.0 275.0 275.0
>>
>> y <- x
>> y[2,2] <- NA
>>
>> normalizeBetweenArrays(y)
> Chip1 Chip2 Chip3
> RNA-A 134.44444 47.66667 134.44444
> RNA-B 47.66667 NA 47.66667
> RNA-C 226.11111 180.27778 226.11111
> RNA-D 275.00000 275.00000 275.00000
>
>
> I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ?
> Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ?
> My environment is limma Version 3.16.6, R version 3.0.1.
>
> Thanks
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list