[BioC] Vsn transformation repeatability

Wolfgang Huber huber at ebi.ac.uk
Wed Oct 29 13:27:04 CET 2008


Dear Paolo,

thank you for your feedback! There are three issues:

1. You say "number of ... differentially expressed genes for a
particular FDR changes with each run". By how much? The variation should
be tiny (negligible?), otherwise I would be worried about the stability
of your method to compute differential expressedness, and FDR.

2. "vsnrma" calls the "vsn2" method for "AffyBatch" objects, which by
default estimates the transformation from a random subsample of the
microarray features, see
   showMethods(f="vsn2", classes="AffyBatch", includeDefs=TRUE)
Please see the code example below for how to deactivate the subsampling.
This should be the only source of (pseudo)randomness, everything else
(including the numerical optimisation) should be deterministic.

3. Of course, there is also "set.seed" (which is acceptable if you have
convinced yourself about point 1 above.)

Best wishes
 Wolfgang


 library("vsn")
 library("affydata")
 data("Dilution")

 u1 = vsnrma(Dilution)
 u2 = vsnrma(Dilution)

 identical(exprs(u1), exprs(u2))
 # [1] FALSE

 u3 = vsnrma(Dilution, subsample=0)
 u4 = vsnrma(Dilution, subsample=0)
 identical(exprs(u3), exprs(u4))
 # [1] TRUE

 sessionInfo()

R version 2.9.0 Under development (unstable) (2008-10-28 r46793)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=it_IT.UTF-8;LC_NUMERIC=C;LC_TIME=it_IT.UTF-8;LC_COLLATE=it_IT.UTF-8;LC_MONETARY=C;LC_MESSAGES=it_IT.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
 [1] hgu95av2cdf_2.2.0    affydata_1.11.3      vsn_3.9.2
 [4] limma_2.14.7         affy_1.18.2          preprocessCore_1.2.1
 [7] affyio_1.11.0        Biobase_2.0.1        lattice_0.17-15
[10] fortunes_1.3-5

loaded via a namespace (and not attached):
[1] grid_2.9.0


------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber


29/10/2008 10:20 Paolo Innocenti scripsit
> Hi all,
> 
> We are attempting to identify significantly differentially expressed
> genes between two treatments of female Drosophila melanogaster, using
> results from Affymetrix Drosophila2 microarray chip hybridization (8
> chips, 4 per treatment). The end result is that the number of
> significantly differentially expressed genes for a particular FDR
> changes with each run, despite the input data and the program code
> remaining unchanged.
> 
> We are using ”vsnrma” for pre-processing, which utilizes a search
> algorithm to find the maximum likelihood transformed values. The
> procedure produces slightly different calibrated GLOG2-transformed
> expression values on each run, despite the untransformed expression
> values and the R code being identical each time. Our first guess is that
> the search algorithm is unable to locate the likelihood peak and settles
> on a slightly different set of transformed values each time. Is this a
> possibility? If so, is there anything we can do to fix it? E.g. increase
> the number of iterations, or modify the search algorithm.
> 
> Many thanks in advance.
> Paolo Innocenti
> 
> 
>> library(affy)
> Loading required package: Biobase
> Loading required package: tools
> 
> Welcome to Bioconductor
> 
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
> 
>> library(vsn)
> Loading required package: lattice
> Loading required package: limma
>> data <- ReadAffy(celfile.path="./celfiles")
>> sampleNames(data) <- c("z1","z2","z3","z4","m1","m2","m3","m4")
>> eset1 <- vsnrma(data)
> 100% done.4 x 8 matrix (1 stratum).   0% done.
> Please use 'meanSdPlot' to verify the fit.
> Calculating Expression
>> eset2 <- vsnrma(data)
> 100% done.4 x 8 matrix (1 stratum).   0% done.
> Please use 'meanSdPlot' to verify the fit.
> Calculating Expression
>> exprs(eset1)[1,]
>       z1       z2       z3       z4       m1       m2       m3       m4
> 12.52630 12.23441 12.54579 12.35582 12.45190 12.35984 12.33933 12.35031
>> exprs(eset2)[1,]
>       z1       z2       z3       z4       m1       m2       m3       m4
> 12.52422 12.24059 12.53899 12.35949 12.44828 12.35460 12.34465 12.35277
>> head(exprs(eset1))==head(exprs(eset2))
>                 z1    z2    z3    z4    m1    m2    m3    m4
> 1616608_a_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622892_s_at FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622893_at   FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622894_at   FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622895_at   FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 1622896_at   FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>> sessionInfo()
> R version 2.8.0 (2008-10-20)
> i686-pc-linux-gnu
> 
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
> 
> 
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] drosophila2cdf_2.3.0 vsn_3.8.0            limma_2.16.2
> [4] lattice_0.17-15      affy_1.20.0          Biobase_2.2.0
> 
> loaded via a namespace (and not attached):
> [1] affyio_1.10.0        grid_2.8.0           preprocessCore_1.4.0
> 
>



More information about the Bioconductor mailing list