[BioC] Is GCRMA influenced by lexicographical order?
Markus Boenn
boenn at informatik.uni-halle.de
Mon Nov 8 10:19:35 CET 2010
Dear all,
I've made some experiment about GCRMA using bg.adjust.gcrma() contained
in the gcrma package.
I take two CEL files, A.cel and B.cel (CASE I). Both are from arrays of
the same type, for instance Mouse Gene 430 2.0 Arrays. In addition, I
rename both files in a way, which changes the lexicographical order,
say, I rename A.cel to BA.cel and I rename B.cel to AB.cel (CASE II).
Using bg.adjust.gcrma() and exprs() to obtain the expression values, for
some probe sets I get different values for A.cel (in CASE I) than for
BA.cel (in CASE II), although both files contain the same data. Of
course, for B.cel and AB.cel the same phenomenon becomes obvious.
How can this be possible? To me, it seems as if GCRMA normalizes the
data with respect to the lexicographical order, but as far as I know,
GCRMA treats each array independently.
This effect is not reproducible using call.exprs() with argument
algorithm="rma" or "mas5". But is reproducible for other arrays like
HGU95A, for example.
Please, can anybody try to explain me, if this effect is a bug or a
feature (and why)?
Best wishes,
Markus
#################################################################
sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] hgu133plus2probe_2.7.0 AnnotationDbi_1.12.0 hgu133plus2cdf_2.7.0
[4] simpleaffy_2.26.0 genefilter_1.32.0 gcrma_2.22.0
[7] affy_1.28.0 Biobase_2.10.0
loaded via a namespace (and not attached):
[1] Biostrings_2.18.0 DBI_0.2-5 IRanges_1.8.2
[4] RSQLite_0.9-2 affyio_1.18.0 annotate_1.28.0
[7] preprocessCore_1.12.0 splines_2.12.0 survival_2.35-8
[10] tools_2.12.0 xtable_1.5-6
Also tried on other machine with similar packages
sessionInfo()
R version 2.11.1 Patched (2010-09-16 r52943)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] simpleaffy_2.24.0 genefilter_1.30.0 gcrma_2.20.0 affy_1.26.1
[5] Biobase_2.8.0
loaded via a namespace (and not attached):
[1] affyio_1.16.0 annotate_1.26.1 AnnotationDbi_1.10.2
[4] Biostrings_2.16.9 DBI_0.2-5 IRanges_1.6.17
[7] preprocessCore_1.10.0 RSQLite_0.9-2 splines_2.11.1
[10] survival_2.35-8 tools_2.11.1 xtable_1.5-6
#################################################################
Code example
setwd("PATH TO CASE I DATA") #the directory containing A.cel and B.cel
CEL <- ReadAffy()
bag.1 <- bg.adjust.gcrma(CEL)
ebag.1 <- exprs(bag.1)
setwd("PATH TO CASE II DATA") #the directory containing BA.cel and AB.cel
CEL <- ReadAffy()
bag.2 <- bg.adjust.gcrma(CEL)
ebag.2 <- exprs(bag.2)
par(mfrow=c(2,1))
# differences should be zero
plot(ebag.1[,1]-ebag.2[,2])
plot(ebag.1[,2]-ebag.2[,1])
More information about the Bioconductor
mailing list