[BioC] Re: normalisation or analysis with batch effects
James W. MacDonald
jmacdon at med.umich.edu
Wed Dec 1 17:42:07 CET 2004
>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch
>[mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of
>Adaikalavan
>Ramasamy
>Sent: 30 November 2004 23:51
>To: BioConductor mailing list
>Cc: Andrea Pellagatti
>Subject: [BioC] normalisation or analysis with batch effects
>
>
>Dear list,
>
>If the following question has been asked before, I do apologise in
>advance and hope someone can point to the relevant thread. Otherwise I
>would appreciate some thoughts and pointers to this problem.
>Thank you.
>
>
>Problem : My collaborator (cc-ed here) has performed hybridisation for
>11 tumour and 40 normal samples on Affymetrix HGU-133Av2
>(contains ~55k
>probesets) chips. He had hybridised about half of the samples when he
>realised he needed more Affymetrix chips.
>
>The second batch of chips arrived with the instruction to add DMSO in
>the hybridisation cocktail, which he followed. The first batch did not
>have such instruction. Therefore we believe that the two
>batches are not
There is a much larger difference between these protocols than simply
adding DMSO. If I am not mistaken, the difference here is that the old
samples were processed using the Enzo IVT kit, and the new samples were
processed using the Affy IVT kit. We have found that these data cannot
be processed together using e.g., RMA because a large portion of the
probesets have completely different patterns. In addition, the
distribution of PM probes is quite different for the two protocols, so I
don't think a quantile normalization is appropriate. You can check this
by fitting the RMA model using rmaPLM() in the affyPLM package, and then
checking the residual plots.
We have shied away from combining chips that were processed using the
two IVT kits, but if you have to do so, I would recommend processing
each group separately and then fitting a model with a batch effect.
Best,
Jim
>directly comparable. A posting to GeneArray mailing list had a reply
>(http://bfx.kribb.re.kr/gene-array/1255.html) supporting this view. A
>cross-table of batch and sample is given below :
>
> | normal tumour total
> batch 1 (with DMSO) | 17 6 23
> batch 2 (without DMSO) | 23 5 28
> -----------------------|---------------------
> total | 40 11 51
>
>
>Therefore I have considered the following possible solutions :
>
>1) Preprocess all arrays and compare tumour vs. normal
>
>2) Preprocess the two batches separately and cbind() them.
>Then compare
>tumour vs. normal
>
>3) Preprocess all arrays but include a batch effect in analysis ( I am
>not sure how to do this - perhaps using LIMMA)
>
>4) Preprocess separately and proceed as 3)
>
>Here, I use RMA to preprocess the arrays. I have done 1) and
>2) and the
>correlation of the two gene lists, as assessed by correlation of gene
>ranks, is only 0.35. I think 4) is a bit of overkill.
>
>Any opinions or alternative suggestions are very welcomed. Thank you.
>
>Regards,
>--
>Adaikalavan Ramasamy ramasamy at cancer.org.uk
>Centre for Statistics in Medicine http://www.ihs.ox.ac.uk/csm/
>Cancer Research UK Tel : 01865 226 677
>Old Road Campus, Headington, Oxford Fax : 01865 226 962
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
>
--
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
More information about the Bioconductor
mailing list