[BioC] batch effects and VSN
Hans-Ulrich Klein
h.klein at uni-muenster.de
Wed Sep 26 17:43:35 CEST 2007
Hello List,
I am analyzing some arrays with strong "batch effects". The source of
the variation is unknown. The biologists and I found out that some of
the systematic variation is related to some processing steps in the
laboratory (ChIP-experiment).
My first general question is how do you deal with batch effects? I found
not much about it in the archive.
I proceeded as follows:
I used limma for computing oligos with differential intensities between
two classes. Adding a factor for batch effects is easy and reduces the
R^2 of the gene-wise models in my case noticeably.
I am more worried about the normalization. I like VSN and used it here,
too. The arrays are single color oligonucleotide arrays (not
commercial). The VSN vignette states that VSN is not capable of
calibrating arrays from different batches.
Using the notation of the vignette, the vsn model is:
y_ki = a_ki + b_i b_k c_ki
y_ki is the measured intensity of gene k on array i. c_ki is the true
mRNA abundance. The oligo-specific factor b_k is not estimated. Instead
the normalized intensities are given in probe-specific units. However,
b_k will perhaps be different for different batches. Could one
substitute b_k by b_kb, which is a oligo-specific factor for oligo k in
batch b? b_kb has to be estimated from data.
I am not sure, whether it is practical. The number of model parameters
increases a lot. So, I wonder if someone has tried this (or something
similar) before?
Any comments are welcome. Also hints to other normalization procedures
which may be suitable. Currently, I am using (standard) VSN. It seems to
work (stable variance, iteration converges), but the batch effects
remain. And probably, the LTS regression chooses probes for estimation,
which have small batch effects (and not necessarily an equal amount of
hybridized DNA between my to classes of interest).
Regards,
Hans-Ulrich
--
Hans-Ulrich Klein
Westfälische Wilhelms-Universität Münster
Department of Medical Informatics and Biomathematics
Domagkstr. 9, 48149 Münster
More information about the Bioconductor
mailing list