[BioC] Vsn transformation repeatability
Wolfgang Huber
huber at ebi.ac.uk
Wed Oct 29 13:27:04 CET 2008
Dear Paolo,
thank you for your feedback! There are three issues:
1. You say "number of ... differentially expressed genes for a
particular FDR changes with each run". By how much? The variation should
be tiny (negligible?), otherwise I would be worried about the stability
of your method to compute differential expressedness, and FDR.
2. "vsnrma" calls the "vsn2" method for "AffyBatch" objects, which by
default estimates the transformation from a random subsample of the
microarray features, see
showMethods(f="vsn2", classes="AffyBatch", includeDefs=TRUE)
Please see the code example below for how to deactivate the subsampling.
This should be the only source of (pseudo)randomness, everything else
(including the numerical optimisation) should be deterministic.
3. Of course, there is also "set.seed" (which is acceptable if you have
convinced yourself about point 1 above.)
Best wishes
Wolfgang
library("vsn")
library("affydata")
data("Dilution")
u1 = vsnrma(Dilution)
u2 = vsnrma(Dilution)
identical(exprs(u1), exprs(u2))
# [1] FALSE
u3 = vsnrma(Dilution, subsample=0)
u4 = vsnrma(Dilution, subsample=0)
identical(exprs(u3), exprs(u4))
# [1] TRUE
sessionInfo()
R version 2.9.0 Under development (unstable) (2008-10-28 r46793)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] hgu95av2cdf_2.2.0 affydata_1.11.3 vsn_3.9.2
[4] limma_2.14.7 affy_1.18.2 preprocessCore_1.2.1
[7] affyio_1.11.0 Biobase_2.0.1 lattice_0.17-15
[10] fortunes_1.3-5
loaded via a namespace (and not attached):
[1] grid_2.9.0
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
29/10/2008 10:20 Paolo Innocenti scripsit
> Hi all,
>
> We are attempting to identify significantly differentially expressed
> genes between two treatments of female Drosophila melanogaster, using
> results from Affymetrix Drosophila2 microarray chip hybridization (8
> chips, 4 per treatment). The end result is that the number of
> significantly differentially expressed genes for a particular FDR
> changes with each run, despite the input data and the program code
> remaining unchanged.
>
> We are using ”vsnrma” for pre-processing, which utilizes a search
> algorithm to find the maximum likelihood transformed values. The
> procedure produces slightly different calibrated GLOG2-transformed
> expression values on each run, despite the untransformed expression
> values and the R code being identical each time. Our first guess is that
> the search algorithm is unable to locate the likelihood peak and settles
> on a slightly different set of transformed values each time. Is this a
> possibility? If so, is there anything we can do to fix it? E.g. increase
> the number of iterations, or modify the search algorithm.
>
> Many thanks in advance.
> Paolo Innocenti
>
>
>> library(affy)
> Loading required package: Biobase
> Loading required package: tools
>
> Welcome to Bioconductor
>
> Vignettes contain introductory material. To view, type
> 'openVignette()'. To cite Bioconductor, see
> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>
>> library(vsn)
> Loading required package: lattice
> Loading required package: limma
>> data <- ReadAffy(celfile.path="./celfiles")
>> sampleNames(data) <- c("z1","z2","z3","z4","m1","m2","m3","m4")
>> eset1 <- vsnrma(data)
> 100% done.4 x 8 matrix (1 stratum). 0% done.
> Please use 'meanSdPlot' to verify the fit.
> Calculating Expression
>> eset2 <- vsnrma(data)
> 100% done.4 x 8 matrix (1 stratum). 0% done.
> Please use 'meanSdPlot' to verify the fit.
> Calculating Expression
>> exprs(eset1)[1,]
> z1 z2 z3 z4 m1 m2 m3 m4
> 12.52630 12.23441 12.54579 12.35582 12.45190 12.35984 12.33933 12.35031
>> exprs(eset2)[1,]
> z1 z2 z3 z4 m1 m2 m3 m4
> 12.52422 12.24059 12.53899 12.35949 12.44828 12.35460 12.34465 12.35277
>> head(exprs(eset1))==head(exprs(eset2))
> z1 z2 z3 z4 m1 m2 m3 m4
> 1616608_a_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622892_s_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622893_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622894_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622895_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622896_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>> sessionInfo()
> R version 2.8.0 (2008-10-20)
> i686-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
>
> attached base packages:
> [1] tools stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] drosophila2cdf_2.3.0 vsn_3.8.0 limma_2.16.2
> [4] lattice_0.17-15 affy_1.20.0 Biobase_2.2.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.10.0 grid_2.8.0 preprocessCore_1.4.0
>
>
More information about the Bioconductor
mailing list