[BioC] A couple VSN related questions

Wolfgang Huber huber at ebi.ac.uk
Thu Jan 31 01:55:00 CET 2008


Dear Leo,

answers please see below.

> I am new to Bioconductor and R (having the latest version, Bioconductor 2.1
> and R 2.6.1, installed) so please bear with me if I ask something
> naive/stupid.  Anyway, the most important reason why I come to Bioconductor
> is to use the vsn package, and after trying it a bit, I have the following
> questions:
> 
> 1. It is clear from the documentation (and a previous email from Wolfgang)
> that the real computation of vsn2 is written in C.  Is it possible for me to
> access the C source code (and compile and run it myself)?  If yes, this will
> be truly great since I have to do other computational analysis outside of R
> anyway.

Yes, it is all open source. The most obvious way is to download the 
.tar.gz file from 
http://www.bioconductor.org/packages/devel/bioc/html/vsn.html
and unpack it. See the file vsn2.c. There is also public, anonymous svn 
access, which is described on the bioc website.

> 2. I am trying to follow the computation performed by VSN as documented in
> the "Introduction to VSN" vignette.  Loading the kidney dataset, I have the
> following data before and after VSN:
>> exprs(kidney)[1:10,]
>      channels
> spots  green    red
>    1  815.32 937.02
>    2  671.66 765.03
>    3  713.93 713.59
>    4  703.97 656.70
>    5  493.59 472.10
>    6  477.92 453.13
>    7  346.55 385.47
>    8  377.34 421.86
>    9  132.80 121.77
>    10 148.65 130.41
>> fit = vsn2(kidney)
> vsn: 8704 x 2 matrix (1 stratum). 100% done.
>> fit at hx[1:10,]
>      channels
> spots    green      red
>    1  9.711781 9.870624
>    2  9.453765 9.597638
>    3  9.533994 9.506128
>    4  9.515436 9.398666
>    5  9.067212 8.995503
>    6  9.028838 8.948561
>    7  8.673814 8.771463
>    8  8.762664 8.868634
>    9  7.957576 7.926483
>    10 8.016070 7.957696
> 
> Now I am trying to reproduce the result from parameters estimated by VSN
> using
>> coef(fit)[1,,]
>             [,1]      [,2]
> [1,] -0.08444849 -5.927967
> [2,] -0.06787093 -5.957545
>> offset <- coef(fit)[1,,1];
>> scale <- exp(coef(fit)[1,,2]);
>> ynorm_my = asinh(exprs(kidney)*scale + offset);
>> ynorm_my[1:10,]
>      channels
> spots     green       red
>    1  1.4820851 1.6139176
>    2  1.2851048 1.4029668
>    3  1.3588520 1.3584152
>    4  1.3272734 1.2650499
>    5  1.0353039 0.9986861
>    6  0.9954236 0.9530607
>    7  0.7626212 0.8400535
>    8  0.8148209 0.8976601
>    9  0.2661626 0.2376892
>    10 0.3115130 0.2662458
> 
> which is clearly different from the result produced by VSN.  Could anybody
> tell me what I have done wrong?  Thank you so much!

Try this:

library("vsn")
data("kidney")
fit = vsn2(kidney)
shift = coef(fit)[1,,1]
scale = exp(coef(fit)[1,,2])

nk = t(asinh(t(exprs(kidney))*scale + shift))
nk = nk/log(2) - fit at hoffset

identical(nk , exprs(fit))
[1] TRUE


This is a subtle point. It is mentioned (admittedly too briefly) in the 
vsn2 man page, section "Note on overall scale and location of the glog 
transformation":

  The data are returned on a glog scale to base 2. More precisely,
  the transformed data are subject to the transformation
  glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) =
  asinh(u)/log(2) is called the generalised logarithm, a and b are
  the fitted model parameters (see references), f is a parameter
  transformation [4], and the overall constant offset c is computed
  from b such that for large x the transformation approximately
  corresponds to the log2 function. The offset c is inconsequential
  for all differential expression calculations, but many users like
  to see the data in a range that they are familiar with.


So the difference to what you did is (i) to go from glog to glog_2 scale 
(the division by log(2)), and (ii) to subtract fit at hoffset. The 
computation of fit at hoffset is done towards the end of the "vsnMatrix" 
function.

Best wishes
Wolfgang



R version 2.7.0 Under development (unstable) (2008-01-29 r44233)
x86_64-unknown-linux-gnu

locale:
C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] vsn_3.4.5            limma_2.13.3         affy_1.17.3
[4] preprocessCore_1.1.5 affyio_1.7.9         Biobase_1.17.8
[7] fortunes_1.3-4

loaded via a namespace (and not attached):
[1] grid_2.7.0     lattice_0.17-4



More information about the Bioconductor mailing list