[BioC] summarizing probe intensites before or after normalization- 1. how to do with RMA 2. Opinions?
James W. MacDonald
jmacdon at med.umich.edu
Mon Sep 11 15:42:45 CEST 2006
k. brand wrote:
> Dear All,
>
> I compared two normalization approaches for an experiment using twelve
> affy 430-2.0 chips. (histogram plot comparing bith methods forwarded on
> request).
>
> #1. RMA
> library(affy)
> data <- ReadAffy()
> datarma <- rma(data)
> exprs2excel(datarma, file="dataRMA.csv")
>
> Plotting histograms of the output shows arrays NOT perfectly aligning at
> the means and spreads.
>
> I used a custom script to effect a quantile normalization on MAS5
> preprocessed but unnormalized data-
>
> #2. Mas5 sans interchip normalization
> library(affy)
> data <- ReadAffy()
> datamas5sannorm <- mas5(data, normalize=FALSE)
> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv")
> f.qnorm <- function(x,qinit=0.75,perc=100) {...
>
> The means and spreads of this normalization approach do align perfectly.
>
> THUS- summarizing probe intensites before or after normalization does
> appear to make a noticeable difference, as may be expected.
>
> My questions/requests-
>
> 1. Help to effect Bolstad normalization of the RMA preprocessed and
> summarized data. Whilst I succeed in generating unnormalized RMA
> preprocessed data with-
>
> library(affy)
> data <- ReadAffy()
> datarma <- rma(data, normalize=FALSE)
Next step would be
datarma <- normalize.quantiles(exprs(datarma))
also note that 'data' is not a very good variable name, as you are
masking an existing function. When creating variable names it is often
enlightening to type the name first at an R prompt to see if you get any
response.
>
> As a result of my limited R experience, I failed in finding a method to
> effect Bolstad (quantile) normalization on this output.
>
> 2. Thoughts/comments on the benefits/caveats of normalizing before or
> after summarizing probe intensities.
Normalizing after summarization for something like rma() seems
questionable to me. Since the expression values are based on fitting a
model to the PM probe values, if you don't normalize first you are
ignoring any non-biological variability which may end up biasing your
results. Using median polish for the model fit should help protect
against this, but I don't know that I would want to take chances.
As an aside, how far off are the histograms? Are you sure that there is
a reasonable difference? Eyeballing a histogram isn't the best way to
determine if the mean and variance are different or not. A quick run
through with some data here shows very little differences:
> eset <- justRMA(filenames=list.celfiles()[1:10])
> apply(exprs(eset),2,summary)
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7
Min. 4.085 4.070 4.091 4.051 4.068 4.090 4.087
1st Qu. 5.835 5.859 5.832 5.812 5.842 5.858 5.852
Median 7.079 7.069 7.048 7.061 7.070 7.077 7.080
Mean 7.225 7.227 7.224 7.227 7.229 7.225 7.232
3rd Qu. 8.352 8.324 8.351 8.363 8.361 8.330 8.347
Max. 14.550 14.440 14.420 14.400 14.490 14.430 14.260
Best,
Jim
>
> I look forward to any thoughts, advice & suggestions from users.
>
> thanks in advance,
>
> Karl
>
>
> ===========================================
>
> > sessionInfo()
> Version 2.3.0 (2006-04-24)
> i386-pc-mingw32
>
> attached base packages:
> [1] "tools" "methods" "stats" "graphics" "grDevices" "utils"
> "datasets" "base"
>
> other attached packages:
> affy affyio Biobase
> "1.10.0" "1.0.0" "1.10.0"
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
More information about the Bioconductor
mailing list