[BioC] Question about quantile normalization and NA value
Steve Lianoglou
lianoglou.steve at gene.com
Tue Jan 21 18:40:52 CET 2014
Hi,
On Tue, Jan 21, 2014 at 5:03 AM, <H at mamba.fhcrc.org> wrote:
>
> Dear all,
>
> I have a quation about quantile normalization and NA value.
>
> I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package.
> I normalized a data with NA as follows:
>
>> x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3)
>> colnames(x) <- paste("Chip",1:3, sep="")
>> rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D")
>>
>> x
> Chip1 Chip2 Chip3
> RNA-A 100 110.0 120
> RNA-B 15 16.5 18
> RNA-C 200 220.0 240
> RNA-D 250 275.0 300
>>
>> normalizeBetweenArrays(x)
> Chip1 Chip2 Chip3
> RNA-A 110.0 110.0 110.0
> RNA-B 16.5 16.5 16.5
> RNA-C 220.0 220.0 220.0
> RNA-D 275.0 275.0 275.0
>>
>> y <- x
>> y[2,2] <- NA
>>
>> normalizeBetweenArrays(y)
> Chip1 Chip2 Chip3
> RNA-A 134.44444 47.66667 134.44444
> RNA-B 47.66667 NA 47.66667
> RNA-C 226.11111 180.27778 226.11111
> RNA-D 275.00000 275.00000 275.00000
>
>
> I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ?
I suspect that this is only because you are doing the normalization
over a very small dataset. With four observations per "array", 25% of
your data on chip2 is missing ... so a change in a single datapoint
has a larger affect than it would on your real array (which would have
thousands of observations per array).
Of course, if 25% of your real arrays have NA values, you might
consider failing that array anyway ;-)
> Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ?
I'd think not. If you are analyzing commercial array, just stick with
the prescribed steps you find in some of the many tutorials available
(in limma or other bioc tutorials). If you have a custom array, more
care will be needed.
-steve
--
Steve Lianoglou
Computational Biologist
Genentech
More information about the Bioconductor
mailing list