# [BioC] Question about quantile normalization

K J hamuteru_kimiteru at wind.ocn.ne.jp
Tue Aug 2 15:41:26 CEST 2011

```Dear Dr. Laurent Gautier,

> Quantile normalization is usually performed on untransformed data
("raw-scale").

I read some journals on RMA or RMA++, and I made sure that raw-scale is
usually used for quantile normarlization as you say.

> Missing values are just ignored and left as such (missing values).

I'm afraid I don't quite understand what you say on how to treat the N/
A.
Based on the program written in "normalizeBetweenArrays", it seems
that the
N/A is first replaced with the median, but after ranking, averaging and
re-ordering to the original position, it's transformed back to N/A.  Is
this correct?

I described the example herein:

step 1: Calculating median in each column

> ngenes <- 5
> narrays <- 2

> x <- matrix(c(1:10),ngenes,narrays)
> x[2,1] <- x[3,1] <- NA
> x[4,2] <- x[5,2] <- NA

> x
[,1] [,2]
[1,]    1    6
[2,]   NA    7
[3,]   NA    8
[4,]    4   NA
[5,]    5   NA

> (xm <- apply(x,2,median,na.rm=T))

[1] 4 7

step 2: Replacing N/A in each column with median

> x[2,1] <- x[3,1] <- xm[1]
> x[4,2] <- x[5,2] <- xm[2]

[,1] [,2]
[1,]    1    6
[2,]    4    7
[3,]    4    8
[4,]    4    7
[5,]    5    7

step 3: Sorting values in each column in descending order

[,1] [,2]
[1,]    5    8
[2,]    4    7
[3,]    4    7
[4,]    4    7
[5,]    1    6

step 4: Averaging values in each rank

[,1] [,2]    Average
[1,]    5    8    6.5
[2,]    4    7    5.5
[3,]    4    7    5.5
[4,]    4    7    5.5
[5,]    1    6    3.5

step 5: Replacing the values in each column with the average

[,1]   [,2]      Average
[1,]    6.5    6.5      6.5
[2,]    5.5    5.5      5.5
[3,]    5.5    5.5      5.5
[4,]    5.5    5.5      5.5
[5,]    3.5    3.5      3.5

step 6: Re-sorting the values in each column at original positions

[,1]   [,2]
[1,]    3.5    3.5
[2,]    5.5    5.5
[3,]    5.5    6.5
[4,]    5.5    5.5
[5,]    6.5    5.5

step 7: Replacing the values with N/A at original positions

[,1]   [,2]
[1,]    3.5    3.5
[2,]    NA     5.5
[3,]    NA     6.5
[4,]    5.5    NA
[5,]    6.5    NA

This result corresponds to normalizeBetweenArrays() result;

> x <- matrix(c(1:10),ngenes,narrays)
> x[2,1] <- x[3,1] <- NA
> x[4,2] <- x[5,2] <- NA
> (y <- normalizeBetweenArrays(x))

[,1]   [,2]
[1,]    3.5    3.5
[2,]    NA     5.5
[3,]    NA     6.5
[4,]    5.5    NA
[5,]    6.5    NA

J, K

--- On Mon, 2011/8/1, Laurent Gautier <laurent at cbs.dtu.dk> wrote:

On 2011-08-01 06:55, qwertyui_period at yahoo.co.jp wrote:
> Dear all,
>
> My environment is limma Version 3.2.2, R version 2.10.1, and
Windows XP.
> I'm going to normalize the microarray data by
"normalizeBetweenArrays"
> which is the quantile normalization function in "limma" package.
> I have read the "usersguide.pdf" in bioconductor website, however,
I still
> have two questions.
>
> Question 1: Which is proper to use for quantile normalization: raw-
scale or
> log2-scale values ?
> The quantile normalization includes the step of calculating
arithmetic
> mean,
> so I suppose the raw-scale values should be used, though the
microarray
> data is generally log2-scale values.

Quantile normalization is usually performed on untransformed data
("raw-scale").
Log2 transformation comes after (before probe summary when using RMA or
RMA-like approaches).

> Question 2: How does "normalizeBetweenArrays" deal \$B!H(BN/A\$B!I(B
in data ?

Missing values are just ignored and left as such (missing values).

Hoping this helps,

L.

> Example code 1,
>
>> ngenes<- 3
>> narrays<- 2
>> x<- matrix(c(3,1,5,6,4,2),ngenes,narrays)
>       [,1] [,2]
> [1,]    3    6
> [2,]    1    4
> [3,]    5    2
>
>> (y<- normalizeBetweenArrays(x))
>       [,1] [,2]
> [1,]  3.5  5.5
> [2,]  1.5  3.5
> [3,]  5.5  1.5
>
> I understand the process of "normalizeBetweenArrays" is devided
into 4
> steps as follows:
>
> step 1: Sorting values in each column in descending order
>
>       [,1] [,2]
> [1,]    5    6
> [2,]    3    4
> [3,]    1    2
>
> step 2: Averaging values in each rank
>
>       [,1] [,2]    Average
> [1,]    5    6    5.5
> [2,]    3    4    3.5
> [3,]    1    2    1.5
>
> step 3: Replacing the values in each column with the average
>
>        [,1]  [,2]  Average
> [1,]  5.5   5.5   5.5
> [2,]  3.5   3.5   3.5
> [3,]  1.5   1.5   1.5
>
> step 4: Re-sorting the values in each column at original positions
>
>       [,1] [,2]
> [1,]  3.5  5.5
> [2,]  1.5  3.5
> [3,]  5.5  1.5
>
> Then, how does "normalizeBetweenArrays" deal \$B!H(BN/A\$B!I(B in
data ?
>
> Example code 2,
>
>> (x<- matrix(c(NA,1,5,6,4,2),ngenes,narrays))
>       [,1] [,2]
> [1,]   NA    6
> [2,]    1    4
> [3,]    5    2
>
> (y<- normalizeBetweenArrays(x))
>
>       [,1] [,2]
> [1,]   NA  5.5
> [2,]  1.5  3.5
> [3,]  5.5  1.5
>
>
>
>     [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

```