[BioC] summarizing probe intensites before or after normalization- 1. how to do with RMA 2. Opinions?

Mon Sep 11 15:42:45 CEST 2006

k. brand wrote:
> Dear All,
> 
> I compared two normalization approaches for an experiment using twelve 
> affy 430-2.0 chips. (histogram plot comparing bith methods forwarded on 
> request).
> 
> #1. RMA
> library(affy)
> data <- ReadAffy()
> datarma <- rma(data)
> exprs2excel(datarma, file="dataRMA.csv")
> 
> Plotting histograms of the output shows arrays NOT perfectly aligning at 
> the means and spreads.
> 
> I used a custom script to effect a quantile normalization on MAS5 
> preprocessed but unnormalized data-
> 
> #2. Mas5 sans interchip normalization
> library(affy)
> data <- ReadAffy()
> datamas5sannorm <- mas5(data, normalize=FALSE)
> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv")
> f.qnorm <- function(x,qinit=0.75,perc=100)  {...
> 
> The means and spreads of this normalization approach do align perfectly.
> 
> THUS- summarizing probe intensites before or after normalization does 
> appear to make a noticeable difference, as may be expected.
> 
> My questions/requests-
> 
> 1. Help to effect Bolstad normalization of the RMA preprocessed and 
> summarized data. Whilst I succeed in generating unnormalized RMA 
> preprocessed data with-
> 
> library(affy)
> data <- ReadAffy()
> datarma <- rma(data, normalize=FALSE)

Next step would be

datarma <- normalize.quantiles(exprs(datarma))

also note that 'data' is not a very good variable name, as you are 
masking an existing function. When creating variable names it is often 
enlightening to type the name first at an R prompt to see if you get any 
response.

> 
> As a result of my limited R experience, I failed in finding a method to 
> effect Bolstad (quantile) normalization on this output.
> 
> 2. Thoughts/comments on the benefits/caveats of normalizing before or 
> after summarizing probe intensities.

Normalizing after summarization for something like rma() seems 
questionable to me. Since the expression values are based on fitting a 
model to the PM probe values, if you don't normalize first you are 
ignoring any non-biological variability which may end up biasing your 
results. Using median polish for the model fit should help protect 
against this, but I don't know that I would want to take chances.

As an aside, how far off are the histograms? Are you sure that there is 
a reasonable difference? Eyeballing a histogram isn't the best way to 
determine if the mean and variance are different or not. A quick run 
through with some data here shows very little differences:

 > eset <- justRMA(filenames=list.celfiles()[1:10])
 > apply(exprs(eset),2,summary)
         Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7
Min.       4.085    4.070    4.091    4.051    4.068    4.090    4.087
1st Qu.    5.835    5.859    5.832    5.812    5.842    5.858    5.852
Median     7.079    7.069    7.048    7.061    7.070    7.077    7.080
Mean       7.225    7.227    7.224    7.227    7.229    7.225    7.232
3rd Qu.    8.352    8.324    8.351    8.363    8.361    8.330    8.347
Max.      14.550   14.440   14.420   14.400   14.490   14.430   14.260

Best,

Jim

> 
> I look forward to any thoughts, advice & suggestions from users.
> 
> thanks in advance,
> 
> Karl
> 
> 
> ===========================================
> 
>    > sessionInfo()
> Version 2.3.0 (2006-04-24)
> i386-pc-mingw32
> 
> attached base packages:
> [1] "tools"     "methods"   "stats"     "graphics"  "grDevices" "utils"
>        "datasets"  "base"
> 
> other attached packages:
>        affy   affyio  Biobase
> "1.10.0"  "1.0.0" "1.10.0"
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.