[BioC] Batch effect in limma
James W. MacDonald
jmacdon at med.umich.edu
Thu Sep 3 16:42:19 CEST 2009
Please don't take things off-list!
Yuan Hao wrote:
> Hi James,
>
> Thank you very much for your reply! I got absolutely no significantly
> differentially expressed genes at all! If by looking at the vennDiagram,
> you can only see zero in all of the three contrasts, which is totally
> against our expectation, so I am wondering whether it is the problem of
> my design and/contrast matrices. You can see from my contrast.matrix, I
> saw lines "Batch3" and "Batch2" are all '0' in there, which make me feel
> wondering whether or not the model taking consideration of the batch
> effect?
The model is defined by your design matrix, not the contrasts matrix. By
fitting the model you have defined here, what you are saying is that the
batch effect is simply a difference in the mean expression for the
samples in different batches, which can be accounted for by subtracting
the differences between batch 2 and 1 from batch 2, and the differences
between batch 3 and 1 from batch 3. Since the model has removed the
differences between batches, the contrasts matrix doesn't need to do
anything with the batch coefficients.
The assumption of different means might not be a valid assumption,
however. It might be that the different batches have different means as
well as different variances. In this case you would want to account for
both by fitting a linear mixed model, using the duplicateCorrelation
function to estimate the correlation structure. If you search the
archives of this list for something like 'batch' and
'duplicateCorrelation' you should be able to find an indication how to
do this. I know Gordon has written more than one email on the subject.
There may also be something in the limma User's Guide.
Best,
Jim
>
> I used another method "ComBat" to try to get rid of the batch effect.
> After that, I got about 2,000 DE genes for my contrast A(in total there
> are over 50,000 genes), around 500 DE genes for contrast C, but still
> nothing for contrast B, which is against our expectation. I checked the
> top 10 adj.p.value in this case for contrast B, and found all of them
> are around 0.07 and the weird thing is they are exactly the same.
>
> Cheers,
> Yuan
>
>
> On 3 Sep 2009, at 14:27, James W. MacDonald wrote:
>
>> Hi Yuan,
>>
>> Yuan Hao wrote:
>>> Dear list,
>>> I got 12 Affymetrix arrays for 4 RNA samples, 3 replicates(from 3
>>> batches individually) for each sample. I want to look at the
>>> differential expression between these samples. While after
>>> clustering, we found obvious batch effect within the data, so we
>>> decided to add the batch effect into the linear model. I set up the
>>> design matrix as followings, but we can't find any differentially
>>> expressed gene that were expected, so would someone help me to have a
>>> look whether there is any problem with my design and contrast matrix?
>>> Thank you very much in advance:
>>
>> I don't see any problems here. When you say you can't find any
>> differentially expressed gene that were expected, is that to imply
>> that you _did_ find differentially expressed genes, but not those that
>> you expected? Or do you mean that there are no significantly
>> differentially expressed genes?
>>
>> Also note that you will need to specify a contrast with topTable() if
>> you want the genes for a particular comparison.
>>
>>
>> Best,
>>
>> Jim
>>
>>
>>> Array Sample Batch
>>> 1 -/- 3
>>> 2 +/+ 3
>>> 3 -/+ 1
>>> 4 +/+ 1
>>> 5 -/+ 3
>>> 6 +/- 3
>>> 7 -/- 1
>>> 8 +/- 1
>>> 9 -/- 2
>>> 10 +/- 2
>>> 11 +/+ 2
>>> 12 -/+ 2
>>> > S <-
>>> c("-/-","+/+","-/+","+/+","-/+","+/-","-/-","+/-","-/-","+/-","+/+","-/+")
>>>
>>> > S <- factor(S, level = c("+/+","-/+","+/-","-/-"))
>>> > B <- c(3,3,1,1,3,3,1,1,2,2,2,2)
>>> > B <- factor(B, level = c(1,2,3))
>>> > design <- model.matrix(~0+S+B)
>>> >
>>> colnames(design)<-c("sample1","sample2","sample3","sample4","Batch3","Batch2")
>>> > design
>>> sample1 sample2 sample3 sample4 Batch3 Batch2
>>> 1 0 0 0 1 1 0
>>> 2 1 0 0 0 1 0
>>> 3 0 1 0 0 0 0
>>> 4 1 0 0 0 0 0
>>> 5 0 1 0 0 1 0
>>> 6 0 0 1 0 1 0
>>> 7 0 0 0 1 0 0
>>> 8 0 0 1 0 0 0
>>> 9 0 0 0 1 0 1
>>> 10 0 0 1 0 0 1
>>> 11 1 0 0 0 0 1
>>> 12 0 1 0 0 0 1
>>> attr(,"assign")
>>> [1] 1 1 1 1 2 2
>>> attr(,"contrasts")
>>> attr(,"contrasts")$TS
>>> [1] "contr.treatment"
>>> attr(,"contrasts")$BE
>>> [1] "contr.treatment"
>>> > fit<-lmFit(eset.gcrma,design)
>>> >
>>> contrast.matrix<-makeContrasts(a=sample1-sample2,b=sample1-sample3,c=sample2-sample4,levels=design)
>>> > contrast.matrix
>>> Contrasts
>>> Levels a b c
>>> sample1 1 1 0
>>> sample2 -1 0 1
>>> sample3 0 -1 0
>>> sample4 0 0 -1
>>> Batch3 0 0 0
>>> Batch2 0 0 0
>>> > fit2<-contrasts.fit(fit,contrast.matrix)
>>> > fit2<-eBayes(fit2)
>>> Cheers,
>>> Yuan
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Douglas Lab
>> University of Michigan
>> Department of Human Genetics
>> 5912 Buhl
>> 1241 E. Catherine St.
>> Ann Arbor MI 48109-5618
>> 734-615-7826
>
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
More information about the Bioconductor
mailing list