[BioC] question about RMA's background adjustment...
Jenny Drnevich
drnevich at uiuc.edu
Wed Jul 18 23:50:54 CEST 2007
Hi all,
I have a question about RMA's background adjustment. The data I'm
working with is from a custom Affy PM-only array, and I used RMA
before the analysis. The PI now wants an estimate of how many of the
probes/probes sets are "present" or above background, to compare with
some cDNA arrays that have the same samples on them. I normally use
Affy's mas5calls algorithm, but I can't here because it's a PM-only
array. I was looking at RMA's background adjustment step, and while I
understand generally that it's a model based approach using only PM
data for each array separately, I'm unsure as to what assumptions go
into it and how the biology of these samples may affect it. The help
file for bg.adjust (the internal function used) says:
"Assumes PMs are a convolution of normal and exponentional. So we
observe X+Y where X is backround and Y is signal. bg.adjust returns
E[Y|X+Y, Y>0] as our backround corrected PM. bg.parameters provides
adhoc estimates of the parameters of the normal and exponential
distributions. "
My questions:
1. How might this model be affected if almost all of the probes are
expressed in the sample? The model was developed using the standard
Affy spike-in and serial dilution data sets, but I'm pretty sure that
those samples used did not have nearly all of the transcripts on the
array in them (typical percent present for Affy's whole genome arrays
are 40-60%). The custom array I'm using is for a particular tissue,
and the same samples on a similar cDNA array have >99% of the spots
above background. If I use the empty features around each PM as an
estimate of background for the Affy array, then ~95 of PM probes have
signals greater than the empty features. I'm worried that if all of
the probe sets are expressed, then RMA's background correction will
actually be subtracting off real signal instead of additive
background. Any thoughts?
2. In my analysis of the cDNA arrays, I decided not to subtract
background at all, because almost all the spots were above the
background estimate. Am I justified in using this rationale for the
Affy arrays and skipping the background correction step? The custom
Affy array was based on the clones from the cDNA arrays, plus other
clones from the same tissue. There is good reason to believe that
just about all of the transcripts on the Affy array are expressed in
the samples we're analyzing.
Thanks for any help,
Jenny
Jenny Drnevich, Ph.D.
Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign
330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA
ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu
More information about the Bioconductor
mailing list