[BioC] limma - Considering Batch Effect
James W. MacDonald
jmacdon at med.umich.edu
Fri Aug 10 16:31:41 CEST 2007
Hi Nathan,
Nathan S. Haigh wrote:
> James MacDonald wrote:
>> Hi Nathan,
>>
>>
>> Nathan Haigh wrote:
>>> I know this topic has arisen a few times from searching the mail
>>> archive, but I'm still not clear on what/how to do this.
>>>
>>> Here is my experimental setup:
>>> Array Param1 Param2 Rep Batch
>>> 1 H 4 1 1
>>> 2 H 4 2 2
>>> 3 H 4 3 2
>>> 4 H 20 1 1
>>> 5 H 20 2 2
>>> 6 H 20 3 2
>>> 7 H 37 1 1
>>> 8 H 37 2 2
>>> 9 H 37 3 2
>>> 10 L 4 1 1
>>> 11 L 4 2 2
>>> 12 L 4 3 2
>>> 13 L 20 1 1
>>> 14 L 20 2 2
>>> 15 L 20 3 2
>>> 16 L 37 1 1
>>> 17 L 37 2 2
>>> 18 L 37 3 2
>>>
>>> All rep 1's were processed in batch 1 and reps 2 and 3 were processes in
>>> batch 2. From using GeneSpring, and conducting a cluster analysis on
>>> experimental parameters, there is a clear batch effect. In GeneSpring, I
>>> normalised the arrays, such that gene1 from array1 was divided by the
>>> median of gene1 over arrays from batch1. Likewise, gene1 from array2 was
>>> divided by the median of gene1 across arrays in batch2. These
>>> successfully removed the batch effect. I'd like to be able to remove the
>>> batch effect doing something similar, or accounting for it in limma.
>>>
>>> In essence, I want to remove the variability in the reps caused by the
>>> batch effect.
>>>
>>> Could anyone explain how this can/should be done?
>> Sure. Just add the Batch effect to your design matrix. For instance,
>> if you want to model the differences in Param1, you would do something
>> like:
>>
>>> design <- model.matrix(~0+factor(rep(1:2,
>> each=9))+factor(rep(rep(1:2,1:2),6)))
>>> colnames(design) <- c("H","L","Batch")
>>> design
>> H L Batch
>> 1 1 0 0
>> 2 1 0 1
>> 3 1 0 1
>> 4 1 0 0
>> 5 1 0 1
>> 6 1 0 1
>> 7 1 0 0
>> 8 1 0 1
>> 9 1 0 1
>> 10 0 1 0
>> 11 0 1 1
>> 12 0 1 1
>> 13 0 1 0
>> 14 0 1 1
>> 15 0 1 1
>> 16 0 1 0
>> 17 0 1 1
>> 18 0 1 1
>> attr(,"assign")
>> [1] 1 1 2
>> attr(,"contrasts")
>> attr(,"contrasts")$`factor(rep(1:2, each = 9))`
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$`factor(rep(rep(1:2, 1:2), 6))`
>> [1] "contr.treatment"
>>
>> Then you can do the contrast between H and L.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>
> Hi Jim,
>
> Thanks very much for your help! I'm still trying to get my head around
> the creation of the design matrix - it doesn't seem too intuitive to a
> newbie. Although I'm sure all will become clear as soon as the penny drops!
You really need to read the Limma User's Guide. In addition, you might
look at a linear modeling textbook. This is a perennial subject on this
listserv, as setting up a design matrix (especially for 2-color stuff)
is non-trivial if you know nothing of linear modeling.
>
> Just to clarify something about the experimental design and hopefully
> help me understand the construction of the design matrix. Param2 is a
> temperature treatment, while Param1 is a "sample type". So it makes
> sense to do contrasts between H.20 and H.4 to get cold-shock genes in H
> samples, H.20 and H.37 to get heat-shock genes in H samples. How would I
> set up the matrix for this? Also, in the context of this experimental
> design, what does a contrast between H and L actually show?
It gets pretty complicated if you set up a design matrix with both
Param1 and Param2 and a batch effect. It would take more time and effort
than I am willing to expend to figure out the contrasts matrix.
A simpler (if slightly less powerful) approach if you just want to
compare the H samples is to set up your design matrix with just Param2
and a batch effect, and then you can simply compare things directly.
Something like
> Param2 <- factor(rep(c(4,20,37), each=3))
> batch <- factor(rep(rep(1:2,1:2),3))
> design <- model.matrix(~0+Param2+batch)
> design
Param24 Param220 Param237 batch2
1 1 0 0 0
2 1 0 0 1
3 1 0 0 1
4 0 1 0 0
5 0 1 0 1
6 0 1 0 1
7 0 0 1 0
8 0 0 1 1
9 0 0 1 1
attr(,"assign")
[1] 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$Param2
[1] "contr.treatment"
attr(,"contrasts")$batch
[1] "contr.treatment"
Then your contrasts matrix would be
> contrast <- makeContrasts(Param220-Param24, Param220-Param237,
levels=design)
> contrast
Contrasts
Levels Param220 - Param24 Param220 - Param237
Param24 -1 0
Param220 1 1
Param237 0 -1
batch2 0 0
The comparison of H and L that I mentioned in the previous email would
simply tell you the difference between the H and L samples without
regard to the Param2. In other words, you would be fitting a t-test
comparing all H samples to all L samples (adjusted for batch).
Best,
Jim
>
> Cheers
> Nathan
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list