[BioC] sequential removeBatchEffect()?
Jenny Drnevich
drnevich at illinois.edu
Wed Jun 29 21:48:42 CEST 2011
Hi Gordon and everyone,
I was wondering what is the best way to remove two different batch
effects from a set of sample values for plotting heatmaps? I have
data from Illumina's MouseWG6 arrays, 24 samples on 4 BeadChips.
There is the effect of BeadChip, which I'm accounting for in the
model using duplicateCorrelation(), and also population batch effect,
which I have as a term in the design matrix. I'd like to remove both
of these effects from the individual sample data before making
heatmaps of significant genes. Should I use removeBatchEffect()
twice, first to remove the BeadChip effect (with population in the
design) and then remove the population effect? Here's an example of
my analysis code and what I'm thinking of doing to remove the two
effect. Please let me know if this is alright, or if I should do
something different:
> design <- model.matrix(~0+ factor(targets$Sample_Group))
> colnames(design) <- levels(factor(targets$Sample_Group))
> design <- cbind(design,pop=as.numeric(targets$pop==1))
> design
Cont.24 Cont.8 DHT.24 DHT.8 pop
1 0 1 0 0 1
2 0 1 0 0 0
3 0 1 0 0 0
4 0 1 0 0 1
5 0 1 0 0 1
6 0 1 0 0 0
7 0 0 0 1 1
8 0 0 0 1 0
9 0 0 0 1 0
10 0 0 0 1 1
11 0 0 0 1 0
12 0 0 0 1 1
13 1 0 0 0 1
14 1 0 0 0 0
15 1 0 0 0 1
16 1 0 0 0 0
17 1 0 0 0 0
18 1 0 0 0 1
19 0 0 1 0 1
20 0 0 1 0 0
21 0 0 1 0 1
22 0 0 1 0 0
23 0 0 1 0 0
24 0 0 1 0 1
> duplicateCorrelation(BSlimma.neqc, design,
block=as.numeric(factor(targets$Sentrix_ID)))$consensus
[1] 0.1528686
> fit.neqc <- lmFit(BSlimma.neqc, design,
block=as.numeric(factor(targets$Sentrix_ID)), correlation=0.1528686)
> cont.matrix <- makeContrasts(DHT.8_v_Cont.8 = DHT.8 - Cont.8,
DHT.24_v_Cont.24 = DHT.24 - Cont.24,
+ Cont.24_v_Cont.8 = Cont.24 - Cont.8,
DHT.24_v_DHT.8 = DHT.24 - DHT.8,
+ interact = (DHT.24 - Cont.24) - (DHT.8
- Cont.8),levels=design)
> fit.neqc2 <- eBayes(contrasts.fit(fit.neqc,cont.matrix))
# This is what I _think_ I should do to remove both batch effects
from the sample data for heatmaps:
> noChip.values <- removeBatchEffect(BSlimm.neqc$E,
batch=as.numeric(factor(targets$Sentrix_ID)), design=design)
> noChip.noPop.values <- removeBatchEffect(noChip.values,
batch=targets$pop, design = design[,1:4] )
Then I can use noChip.noPop.values for heatmaps?
Thanks,
Jenny
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United
States.1252
attached base packages:
[1] splines grid stats graphics grDevices
datasets utils methods base
other attached packages:
[1]
statmod_1.4.9 beadarray_2.2.0 limma_3.8.2
WGCNA_1.00 Hmisc_3.8-3
[6]
survival_2.36-5 qvalue_1.26.0 flashClust_1.01
dynamicTreeCut_1.21 impute_1.26.0
[11]
made4_1.26.0 scatterplot3d_0.3-33 gplots_2.8.0
caTools_1.11 bitops_1.0-4.1
[16]
gdata_2.8.1 gtools_2.6.2 RColorBrewer_1.0-2
ade4_1.4-17 affyPLM_1.28.5
[21] preprocessCore_1.14.0
gcrma_2.24.1 affycoretools_1.24.0 KEGG.db_2.5.0
GO.db_2.5.0
[26]
RSQLite_0.9-4 DBI_0.2-5 AnnotationDbi_1.14.1
affy_1.30.0 Biobase_2.12.1
[31] RWinEdt_1.8-2
loaded via a namespace (and not attached):
[1]
affyio_1.20.0 annaffy_1.24.0 annotate_1.30.0 biomaRt_2.8.1
Biostrings_2.20.1 Category_2.18.0 cluster_1.13.3
[8] genefilter_1.34.0
GOstats_2.18.0 graph_1.30.0 GSEABase_1.14.0 IRanges_1.10.4
lattice_0.19-23 RBGL_1.28.0
[15]
RCurl_1.6-6.1 tcltk_2.13.0 tools_2.13.0 XML_3.4-0.2
xtable_1.5-6
More information about the Bioconductor
mailing list