[BioC] question about RMA's background adjustment...

Wed Jul 18 23:50:54 CEST 2007

Hi all,

I have a question about RMA's background adjustment. The data I'm 
working with is from a custom Affy PM-only array, and I used RMA 
before the analysis. The PI now wants an estimate of how many of the 
probes/probes sets are "present" or above background, to compare with 
some cDNA arrays that have the same samples on them. I normally use 
Affy's mas5calls algorithm, but I can't here because it's a PM-only 
array. I was looking at RMA's background adjustment step, and while I 
understand generally that it's a model based approach using only PM 
data for each array separately, I'm unsure as to what assumptions go 
into it and how the biology of these samples may affect it. The help 
file for bg.adjust (the internal function used) says:

"Assumes PMs are a convolution of normal and exponentional. So we 
observe X+Y where X is backround and Y is signal. bg.adjust returns 
E[Y|X+Y, Y>0] as our backround corrected PM. bg.parameters provides 
adhoc estimates of the parameters of the normal and exponential 
distributions. "

My questions:
1. How might this model be affected if almost all of the probes are 
expressed in the sample?  The model was developed using the standard 
Affy spike-in and serial dilution data sets, but I'm pretty sure that 
those samples used did not have nearly all of the transcripts on the 
array in them (typical percent present for Affy's whole genome arrays 
are 40-60%). The custom array I'm using is for a particular tissue, 
and the same samples on a similar cDNA array have >99% of the spots 
above background. If I use the empty features around each PM as an 
estimate of background for the Affy array, then ~95 of PM probes have 
signals greater than the empty features. I'm worried that if all of 
the probe sets are expressed, then RMA's background correction will 
actually be subtracting off real signal instead of additive 
background. Any thoughts?

2. In my analysis of the cDNA arrays, I decided not to subtract 
background at all, because almost all the spots were above the 
background estimate. Am I justified in using this rationale for the 
Affy arrays and skipping the background correction step? The custom 
Affy array was based on the clones from the cDNA arrays, plus other 
clones from the same tissue. There is good reason to believe that 
just about all of the transcripts on the Affy array are expressed in 
the samples we're analyzing.

Thanks for any help,
Jenny

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu