[BioC] sequential removeBatchEffect()?

Jenny Drnevich drnevich at illinois.edu
Wed Jun 29 21:48:42 CEST 2011


Hi Gordon and everyone,

I was wondering what is the best way to remove two different batch 
effects from a set of sample values for plotting heatmaps? I have 
data from Illumina's MouseWG6 arrays, 24 samples on 4 BeadChips. 
There is the effect of BeadChip, which I'm accounting for in the 
model using duplicateCorrelation(), and also population batch effect, 
which I have as a term in the design matrix. I'd like to remove both 
of these effects from the individual sample data before making 
heatmaps of significant genes. Should I use removeBatchEffect() 
twice, first to remove the BeadChip effect (with population in the 
design) and then remove the population effect? Here's an example of 
my analysis code and what I'm thinking of doing to remove the two 
effect. Please let me know if this is alright, or if I should do 
something different:

 > design <- model.matrix(~0+ factor(targets$Sample_Group))
 > colnames(design) <- levels(factor(targets$Sample_Group))
 > design <- cbind(design,pop=as.numeric(targets$pop==1))
 > design
    Cont.24 Cont.8 DHT.24 DHT.8 pop
1        0      1      0     0   1
2        0      1      0     0   0
3        0      1      0     0   0
4        0      1      0     0   1
5        0      1      0     0   1
6        0      1      0     0   0
7        0      0      0     1   1
8        0      0      0     1   0
9        0      0      0     1   0
10       0      0      0     1   1
11       0      0      0     1   0
12       0      0      0     1   1
13       1      0      0     0   1
14       1      0      0     0   0
15       1      0      0     0   1
16       1      0      0     0   0
17       1      0      0     0   0
18       1      0      0     0   1
19       0      0      1     0   1
20       0      0      1     0   0
21       0      0      1     0   1
22       0      0      1     0   0
23       0      0      1     0   0
24       0      0      1     0   1

 > duplicateCorrelation(BSlimma.neqc, design, 
block=as.numeric(factor(targets$Sentrix_ID)))$consensus
[1]  0.1528686

 > fit.neqc <- lmFit(BSlimma.neqc, design, 
block=as.numeric(factor(targets$Sentrix_ID)), correlation=0.1528686)

 > cont.matrix <- makeContrasts(DHT.8_v_Cont.8 = DHT.8 - Cont.8, 
DHT.24_v_Cont.24 = DHT.24 - Cont.24,
+                             Cont.24_v_Cont.8 = Cont.24 - Cont.8, 
DHT.24_v_DHT.8 = DHT.24 - DHT.8,
+                             interact = (DHT.24 - Cont.24) - (DHT.8 
- Cont.8),levels=design)

 > fit.neqc2 <- eBayes(contrasts.fit(fit.neqc,cont.matrix))

# This is what I _think_ I should do to remove both batch effects 
from the sample data for heatmaps:

 > noChip.values <- removeBatchEffect(BSlimm.neqc$E, 
batch=as.numeric(factor(targets$Sentrix_ID)), design=design)

 > noChip.noPop.values <- removeBatchEffect(noChip.values, 
batch=targets$pop, design = design[,1:4] )

Then I can use noChip.noPop.values for heatmaps?

Thanks,
Jenny

 > sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United 
States.1252

attached base packages:
[1] splines   grid      stats     graphics  grDevices 
datasets  utils     methods   base

other attached packages:
  [1] 
statmod_1.4.9         beadarray_2.2.0       limma_3.8.2 
WGCNA_1.00            Hmisc_3.8-3
  [6] 
survival_2.36-5       qvalue_1.26.0         flashClust_1.01 
dynamicTreeCut_1.21   impute_1.26.0
[11] 
made4_1.26.0          scatterplot3d_0.3-33  gplots_2.8.0 
caTools_1.11          bitops_1.0-4.1
[16] 
gdata_2.8.1           gtools_2.6.2          RColorBrewer_1.0-2 
ade4_1.4-17           affyPLM_1.28.5
[21] preprocessCore_1.14.0 
gcrma_2.24.1          affycoretools_1.24.0  KEGG.db_2.5.0 
GO.db_2.5.0
[26] 
RSQLite_0.9-4         DBI_0.2-5             AnnotationDbi_1.14.1 
affy_1.30.0           Biobase_2.12.1
[31] RWinEdt_1.8-2

loaded via a namespace (and not attached):
  [1] 
affyio_1.20.0     annaffy_1.24.0    annotate_1.30.0   biomaRt_2.8.1 
   Biostrings_2.20.1 Category_2.18.0   cluster_1.13.3
  [8] genefilter_1.34.0 
GOstats_2.18.0    graph_1.30.0      GSEABase_1.14.0   IRanges_1.10.4 
   lattice_0.19-23   RBGL_1.28.0
[15] 
RCurl_1.6-6.1     tcltk_2.13.0      tools_2.13.0      XML_3.4-0.2 
   xtable_1.5-6



More information about the Bioconductor mailing list