[BioC] batch effects and VSN
Wolfgang Huber
huber at ebi.ac.uk
Wed Sep 26 23:07:30 CEST 2007
Dear Hans-Ulrich,
if you are adventurous, you could go into the C code
and modify the code that computes "mu" (the estimate of the probe
effect, equivalent to b_k in your mail below) and have it compute
separate mu's for each batch.
This is in
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/vsn/src/vsn2.c
in double loglik(int n, double *par, void *ex) in the 20 or so lines
following the comment "2nd sweep through the data: compute r_ki". If you
do so, I'd be interested in what comes out.
A second, more pragmatic solution, if, as I assume is the case, your
batches are each sufficiently big, would be to call vsn separately on
each batch and then use some other method (scaling, shifting, local
polynomial) to adjust the transformed values between batches. For that
you should check the meanSdPlots for each batch and verify that they are
similar.
Third, you could lessen your requirement for variance stabilizing and
hope that log-transform does a good enough job. In that case, you can
replace
y_ki = a_ki + b_i b_k c_ki
by
log y_ki = log b_i + log b_k + log c_ki
(in some approximation) and then have b_k be batch-specific. This, I
think, is easy to fit using "lm".
Best wishes
Wolfgang
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
it such a model with VSN directly without going into the C code and
change quite a lot (since there are many more parameters)
Klein ha scritto:
> Hello List,
>
> I am analyzing some arrays with strong "batch effects". The source of
> the variation is unknown. The biologists and I found out that some of
> the systematic variation is related to some processing steps in the
> laboratory (ChIP-experiment).
>
> My first general question is how do you deal with batch effects? I found
> not much about it in the archive.
>
> I proceeded as follows:
>
> I used limma for computing oligos with differential intensities between
> two classes. Adding a factor for batch effects is easy and reduces the
> R^2 of the gene-wise models in my case noticeably.
>
>
> I am more worried about the normalization. I like VSN and used it here,
> too. The arrays are single color oligonucleotide arrays (not
> commercial). The VSN vignette states that VSN is not capable of
> calibrating arrays from different batches.
>
> Using the notation of the vignette, the vsn model is:
>
> y_ki = a_ki + b_i b_k c_ki
>
> y_ki is the measured intensity of gene k on array i. c_ki is the true
> mRNA abundance. The oligo-specific factor b_k is not estimated. Instead
> the normalized intensities are given in probe-specific units. However,
> b_k will perhaps be different for different batches. Could one
> substitute b_k by b_kb, which is a oligo-specific factor for oligo k in
> batch b? b_kb has to be estimated from data.
>
> I am not sure, whether it is practical. The number of model parameters
> increases a lot. So, I wonder if someone has tried this (or something
> similar) before?
>
> Any comments are welcome. Also hints to other normalization procedures
> which may be suitable. Currently, I am using (standard) VSN. It seems to
> work (stable variance, iteration converges), but the batch effects
> remain. And probably, the LTS regression chooses probes for estimation,
> which have small batch effects (and not necessarily an equal amount of
> hybridized DNA between my to classes of interest).
>
> Regards,
> Hans-Ulrich
>
More information about the Bioconductor
mailing list