[BioC] Question about quantile normalization and NA value

Steve Lianoglou lianoglou.steve at gene.com
Tue Jan 21 18:40:52 CET 2014


Hi,

On Tue, Jan 21, 2014 at 5:03 AM,  <H at mamba.fhcrc.org> wrote:
>
> Dear all,
>
> I have a quation about quantile normalization and NA value.
>
> I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package.
> I normalized a data with NA as follows:
>
>> x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3)
>> colnames(x) <- paste("Chip",1:3, sep="")
>> rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D")
>>
>> x
>       Chip1 Chip2 Chip3
> RNA-A   100 110.0   120
> RNA-B    15  16.5    18
> RNA-C   200 220.0   240
> RNA-D   250 275.0   300
>>
>> normalizeBetweenArrays(x)
>       Chip1 Chip2 Chip3
> RNA-A 110.0 110.0 110.0
> RNA-B  16.5  16.5  16.5
> RNA-C 220.0 220.0 220.0
> RNA-D 275.0 275.0 275.0
>>
>> y <- x
>> y[2,2] <- NA
>>
>> normalizeBetweenArrays(y)
>           Chip1     Chip2     Chip3
> RNA-A 134.44444  47.66667 134.44444
> RNA-B  47.66667        NA  47.66667
> RNA-C 226.11111 180.27778 226.11111
> RNA-D 275.00000 275.00000 275.00000
>
>
> I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ?

I suspect that this is only because you are doing the normalization
over a very small dataset. With four observations per "array", 25% of
your data on chip2 is missing ... so a change in a single datapoint
has a larger affect than it would on your real array (which would have
thousands of observations per array).

Of course, if 25% of your real arrays have NA values, you might
consider failing that array anyway ;-)

> Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ?

I'd think not. If you are analyzing commercial array, just stick with
the prescribed steps you find in some of the many tutorials available
(in limma or other bioc tutorials). If you have a custom array, more
care will be needed.

-steve

-- 
Steve Lianoglou
Computational Biologist
Genentech



More information about the Bioconductor mailing list