[BioC] predict vsn with reference
chrisk
chris.knight at manchester.ac.uk
Tue Oct 2 10:29:12 CEST 2007
I'm having difficulty using the 'reference' argument of vsn to put data
from a new microarray onto the scale of an existing set of arrays, when
all the arrays are normalised using a shared set of controls.
I think it's not understanding the way offsets are handled- predicted
values for the data used to create a vsn object are different from the
values stored in that vsn object when a reference is used. e.g. if I
have data from 2 arrays in 'a' and want to put array b back onto their
scale, this is what I'm doing:
library(vsn)
set.seed(214)
vals<-runif(1000)
a<-matrix(rep(vals,2)+0.1*rnorm(2000),1000,2)
b<-vals+0.1*rnorm(1000)
aVsn<-vsn2(a)
bVsn<-vsn2(b,reference=aVsn)
the values stored in bVsn are now on the same scale as the 'a' arrays:
plot(exprs(aVsn)[,2],exprs(bVsn)); abline(0,1)
however, the predictions from bVsn, using the data b are offset from
these values:
plot(exprs(bVsn),predict(bVsn,b)); abline(0,1)
This is an issue when these comparable spots are only a reference set of
probes for a larger array:
aFull<-rbind(a,matrix(runif(20000),10000,2))
bFull<-c(b,runif(10000))
I've been calculating values for the 'a' arrays using:
aFullVal<-predict(aVsn,aFull)
but if I use the same approach for the b array I cease to be on the same
scale as the 'a' arrays:
bFullVal<-predict(bVsn,bFull)
plot(aFullVal[1:1000,1],bFullVal[1:1000,1]); abline(0,1)
I can get back to the scale by subtracting the difference:
offset<-mean(exprs(bVsn)-predict(bVsn,b))
bFullVal2<-bFullVal+offset
plot(aFullVal[1:1000,1],bFullVal2[1:1000,1]); abline(0,1)
But I don't really understand what this offset is or where it comes from
(particularly in this toy example where the offset is much larger than
any real difference between a and b, though I guess I haven't put in
anything that actually needs variance stabilisation).
So it would be good to know i) whether subtraction of whatever the
offset turns out to be is a reasonable approach (especially when b
actually comprises several arrays)? and ii) Is there any less arbitrary
way I can calculate values for array b while keeping on the scale of the
'a' arrays (e.g. using parameter values directly)?
Any help much appreciated,
Chris
> sessionInfo()
R version 2.5.1 (2007-06-27)
i486-pc-linux-gnu
locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils" "datasets"
[7] "methods" "base"
other attached packages:
vsn limma affy affyio Biobase
"2.2.0" "2.10.5" "1.14.2" "1.4.1" "1.14.1"
--
------------------------------------------------------------------------
Dr Christopher Knight Manchester Interdisciplinary Biocentre
room 2.001 The University of Manchester
Tel: +44 (0)161 3065138 131 Princess Street
Fax: +44 (0)161 3064556 Manchester M1 7DN
chris.knight at manchester.ac.uk UK
www.dbkgroup.org/MCISB/people/knight/ ` · . ,,><(((°>
More information about the Bioconductor
mailing list