[BioC] A couple VSN related questions

Wolfgang Huber huber at ebi.ac.uk
Sun Feb 3 18:08:18 CET 2008


Dear Leo

> Thank you very much for your thorough and clear answer!  I have an
> additional question: is the "sigsq" slot of a vsn object the estimated
> variance (which the name and actual value seem to indicate) or the standard
> deviation (as documented on the man page)?  Also is it before or after the
> log(2) transformation (I guess it is before, but want to make sure)?  
> 
> Thanks again for taking your time to answer my questions!

thank you, you are right. It is the variance squared; the man page is 
fixed
in vsn >= 3.4.6. sigma^2 is defined in the vignette "Likelihood 
calculations
for vsn". It is computed before that affine scaling and shifting that we
talked about in the previous mail.

Best wishes
  Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber



> -----Original Message-----
> From: Wolfgang Huber [mailto:huber at ebi.ac.uk] 
> Sent: Wednesday, January 30, 2008 7:55 PM
> To: Leo J. LEE
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] A couple VSN related questions
> 
> Dear Leo,
> 
> answers please see below.
> 
>> I am new to Bioconductor and R (having the latest version, Bioconductor
> 2.1
>> and R 2.6.1, installed) so please bear with me if I ask something
>> naive/stupid.  Anyway, the most important reason why I come to
> Bioconductor
>> is to use the vsn package, and after trying it a bit, I have the following
>> questions:
>>
>> 1. It is clear from the documentation (and a previous email from Wolfgang)
>> that the real computation of vsn2 is written in C.  Is it possible for me
> to
>> access the C source code (and compile and run it myself)?  If yes, this
> will
>> be truly great since I have to do other computational analysis outside of
> R
>> anyway.
> 
> Yes, it is all open source. The most obvious way is to download the 
> .tar.gz file from 
> http://www.bioconductor.org/packages/devel/bioc/html/vsn.html
> and unpack it. See the file vsn2.c. There is also public, anonymous svn 
> access, which is described on the bioc website.
> 
>> 2. I am trying to follow the computation performed by VSN as documented in
>> the "Introduction to VSN" vignette.  Loading the kidney dataset, I have
> the
>> following data before and after VSN:
>>> exprs(kidney)[1:10,]
>>      channels
>> spots  green    red
>>    1  815.32 937.02
>>    2  671.66 765.03
>>    3  713.93 713.59
>>    4  703.97 656.70
>>    5  493.59 472.10
>>    6  477.92 453.13
>>    7  346.55 385.47
>>    8  377.34 421.86
>>    9  132.80 121.77
>>    10 148.65 130.41
>>> fit = vsn2(kidney)
>> vsn: 8704 x 2 matrix (1 stratum). 100% done.
>>> fit at hx[1:10,]
>>      channels
>> spots    green      red
>>    1  9.711781 9.870624
>>    2  9.453765 9.597638
>>    3  9.533994 9.506128
>>    4  9.515436 9.398666
>>    5  9.067212 8.995503
>>    6  9.028838 8.948561
>>    7  8.673814 8.771463
>>    8  8.762664 8.868634
>>    9  7.957576 7.926483
>>    10 8.016070 7.957696
>>
>> Now I am trying to reproduce the result from parameters estimated by VSN
>> using
>>> coef(fit)[1,,]
>>             [,1]      [,2]
>> [1,] -0.08444849 -5.927967
>> [2,] -0.06787093 -5.957545
>>> offset <- coef(fit)[1,,1];
>>> scale <- exp(coef(fit)[1,,2]);
>>> ynorm_my = asinh(exprs(kidney)*scale + offset);
>>> ynorm_my[1:10,]
>>      channels
>> spots     green       red
>>    1  1.4820851 1.6139176
>>    2  1.2851048 1.4029668
>>    3  1.3588520 1.3584152
>>    4  1.3272734 1.2650499
>>    5  1.0353039 0.9986861
>>    6  0.9954236 0.9530607
>>    7  0.7626212 0.8400535
>>    8  0.8148209 0.8976601
>>    9  0.2661626 0.2376892
>>    10 0.3115130 0.2662458
>>
>> which is clearly different from the result produced by VSN.  Could anybody
>> tell me what I have done wrong?  Thank you so much!
> 
> Try this:
> 
> library("vsn")
> data("kidney")
> fit = vsn2(kidney)
> shift = coef(fit)[1,,1]
> scale = exp(coef(fit)[1,,2])
> 
> nk = t(asinh(t(exprs(kidney))*scale + shift))
> nk = nk/log(2) - fit at hoffset
> 
> identical(nk , exprs(fit))
> [1] TRUE
> 
> 
> This is a subtle point. It is mentioned (admittedly too briefly) in the 
> vsn2 man page, section "Note on overall scale and location of the glog 
> transformation":
> 
>   The data are returned on a glog scale to base 2. More precisely,
>   the transformed data are subject to the transformation
>   glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) =
>   asinh(u)/log(2) is called the generalised logarithm, a and b are
>   the fitted model parameters (see references), f is a parameter
>   transformation [4], and the overall constant offset c is computed
>   from b such that for large x the transformation approximately
>   corresponds to the log2 function. The offset c is inconsequential
>   for all differential expression calculations, but many users like
>   to see the data in a range that they are familiar with.
> 
> 
> So the difference to what you did is (i) to go from glog to glog_2 scale 
> (the division by log(2)), and (ii) to subtract fit at hoffset. The 
> computation of fit at hoffset is done towards the end of the "vsnMatrix" 
> function.
> 
> Best wishes
> Wolfgang
> 
>



More information about the Bioconductor mailing list