[BioC] Voom and removing batch effect

Ryan C. Thompson rct at thompsonclan.org
Tue Jan 21 20:50:35 CET 2014


Hello Heather,

The way that edgeR, voom, and limma handle batch effects in a 
differential expression test is not by removing them, but simply 
including a batch effect term in the model. This is what you are doing 
when you use "~batch + dz_cat" as your model formula. When you test for 
differential expression using this design (i.e. by using lmFit, eBayes, 
and topTable), the test will be done in a way that corrects for the 
batch effect.

If you just want to remove the batch effect for visualization or 
clustering purposes, limma provides the "removeBatchEffect" function for 
this purpose.

-Ryan Thompson

On Tue 21 Jan 2014 11:42:29 AM PST, Heather Estrella wrote:
>
> Hi,
>
> I have a miRNA-Seq dataset with clear batch effect that I'm trying to 
> correct for in order to run differential expression using voom/limma. 
> Based on the Limma User Guide, this can be done by adding batch as a 
> confounding factor to the design matrix used by voom and limma fit. 
> However, when I run an MDS plot on batch using the voom results, it's 
> still showing a clear batch effect. What am I missing? What is the 
> best way to correct for and test for batch effect after correction?
>
> In the design matrix, the "dz_cat" group to compare all the other 
> groups does not appear in the design matrix. My understanding from 
> reading other posting by Gordom Smyth is that it's because it is used 
> as the comparison group to all the other groups. I'm not sure why it 
> is all 1's though.
>
> FYI: I am able to correct for batch effect using the cpm-log2 values 
> and ComBat. However, for differential expression using voom/limma I 
> need to use raw counts for the normalization and differential comparison.
>
> Here is my code:
>
>>
>> library(edgeR)
>> library(limma)
>> y = DGEList(counts=datamat,genes=rownames(datamat))
>> y = calcNormFactors(y)
>> design <- model.matrix(~batch+dz_cat, data = samplemat)
>> png(file=paste("../results/MV_plot_voom_",name,".png",sep=""),pointsize=11,bg="white")
>> v <- voom(counts=y, design,plot=TRUE)
>> dev.off()
>> colrs = rainbow(length(unique(samplemat$batch)))
>> mycol=colrs[samplemat$batch]
>> o = which(is.na(mycol))
>> png(height=600,width=600,file=paste("../results/MDS_plot_voom_",name,".png",sep=""),pointsize=11,bg="white")
>> plotMDS(v,top=50,labels=substring(samplemat$batch,1,1),col=mycol,gene.selection="common")
>> dev.off()
>
>
>>
>> design
>
> (Intercept) batch2nd batch3rd dz_catF dz_catL dz_catO
> 1 1 0 1 0 0 0
> 3 1 1 0 0 0 0
> 5 1 1 0 0 0 0
> 7 1 1 0 0 0 0
> 8 1 0 1 0 0 0
> 9 1 0 1 0 0 0
> 11 1 1 0 0 0 0
> 12 1 0 1 0 0 0
> 14 1 1 0 0 0 0
> 16 1 1 0 0 0 0
> 17 1 0 1 0 0 0
> 18 1 0 1 0 0 0
> 20 1 1 0 0 0 0
> 21 1 0 1 0 0 0
> 23 1 1 0 0 0 0
> 24 1 0 1 0 0 0
> 26 1 1 0 0 1 0
> 27 1 0 1 0 1 0
> 28 1 0 1 0 1 0
> 29 1 0 1 0 1 0
> 31 1 1 0 0 1 0
> 33 1 1 0 0 1 0
> 35 1 1 0 0 0 1
> 36 1 0 1 0 0 1
> 38 1 1 0 0 0 1
> 39 1 0 1 0 0 1
> 41 1 1 0 0 0 1
> 42 1 0 1 0 0 1
> 44 1 1 0 0 0 1
> 45 1 0 1 0 0 1
> attr(,"assign")
> [1] 0 1 1 2 2 2
> attr(,"contrasts")
> attr(,"contrasts")$batch
> [1] "contr.treatment"
>
> attr(,"contrasts")$dz_cat
> [1] "contr.treatment"
>
>>
>> sessionInfo()
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] sva_3.8.0 mgcv_1.7-27 nlme_3.1-113 corpcor_1.6.6
> [5] edgeR_3.4.2 matrixStats_0.8.14 limma_3.18.9 gplots_2.12.1
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-6 caTools_1.16 gdata_2.13.2 grid_3.0.2
> [5] gtools_3.2.1 KernSmooth_2.23-10 lattice_0.20-24 Matrix_1.1-1.1
> [9] R.methodsS3_1.6.1
>
> Many thanks,
> Heather
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list