[BioC] batch effects and VSN

Wed Sep 26 23:07:30 CEST 2007

Dear Hans-Ulrich,

if you are adventurous, you could go into the C code
and modify the code that computes "mu" (the estimate of the probe 
effect, equivalent to b_k in your mail below) and have it compute 
separate mu's for each batch.

This is in 
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/vsn/src/vsn2.c
in double loglik(int n, double *par, void *ex) in the 20 or so lines 
following the comment "2nd sweep through the data: compute r_ki". If you 
do so, I'd be interested in what comes out.

A second, more pragmatic solution, if, as I assume is the case, your 
batches are each sufficiently big, would be to call vsn separately on 
each batch and then use some other method (scaling, shifting, local 
polynomial) to adjust the transformed values between batches. For that 
you should check the meanSdPlots for each batch and verify that they are 
similar.

Third, you could lessen your requirement for variance stabilizing and 
hope that log-transform does a good enough job. In that case, you can
replace
    y_ki = a_ki + b_i b_k c_ki
by
    log y_ki = log b_i + log b_k + log c_ki
(in some approximation) and then have b_k be batch-specific. This, I 
think, is easy to fit using "lm".

Best wishes
   Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

it such a model with VSN directly without going into the C code and 
change quite a lot (since there are many more parameters)

  Klein ha scritto:
> Hello List,
> 
> I am analyzing some arrays with strong "batch effects". The source of 
> the variation is unknown. The biologists and I found out that some of 
> the systematic variation is related to some processing steps in the 
> laboratory (ChIP-experiment).
> 
> My first general question is how do you deal with batch effects? I found 
> not much about it in the archive.
> 
> I proceeded as follows:
> 
> I used limma for computing oligos with differential intensities between 
> two classes. Adding a factor for batch effects is easy and reduces the 
> R^2 of the gene-wise models in my case noticeably.
> 
> 
> I am more worried about the normalization. I like VSN and used it here, 
> too. The arrays are single color oligonucleotide arrays (not 
> commercial). The VSN vignette states that VSN is not capable of 
> calibrating arrays from different batches.
> 
> Using the notation of the vignette, the vsn model is:
> 
> y_ki = a_ki + b_i b_k c_ki
> 
> y_ki is the measured intensity of gene k on array i. c_ki is the true 
> mRNA abundance. The oligo-specific factor b_k is not estimated. Instead 
> the normalized intensities are given in probe-specific units. However, 
> b_k will perhaps be different for different batches. Could one 
> substitute b_k by b_kb, which is a oligo-specific factor for oligo k in 
> batch b? b_kb has to be estimated from data.
> 
> I am not sure, whether it is practical. The number of model parameters 
> increases a lot. So, I wonder if someone has tried this (or something 
> similar) before?
> 
> Any comments are welcome. Also hints to other normalization procedures 
> which may be suitable. Currently, I am using (standard) VSN. It seems to 
> work (stable variance, iteration converges), but the batch effects 
> remain. And probably, the LTS regression chooses probes for estimation, 
> which have small batch effects (and not necessarily an equal amount of 
> hybridized DNA between my to classes of interest).
> 
> Regards,
> Hans-Ulrich
>