[BioC] sequential removeBatchEffect()?
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Jun 30 01:08:52 CEST 2011
Hi Jenny,
Perhaps I'm mis-understanding your setup, but isn't Sample_Group
confounded with BeadChip? If the samples are entered in BeadChip order,
then the first six would be BeadChip 1 (and also Cont.8), the second six
would be BeadChip 2 (and also DHT.8), etc. Or have the samples been
reordered by Sample_Group?
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth
On Wed, 29 Jun 2011, Jenny Drnevich wrote:
> Hi Gordon and everyone,
>
> I was wondering what is the best way to remove two different batch effects
> from a set of sample values for plotting heatmaps? I have data from
> Illumina's MouseWG6 arrays, 24 samples on 4 BeadChips. There is the effect of
> BeadChip, which I'm accounting for in the model using duplicateCorrelation(),
> and also population batch effect, which I have as a term in the design
> matrix. I'd like to remove both of these effects from the individual sample
> data before making heatmaps of significant genes. Should I use
> removeBatchEffect() twice, first to remove the BeadChip effect (with
> population in the design) and then remove the population effect? Here's an
> example of my analysis code and what I'm thinking of doing to remove the two
> effect. Please let me know if this is alright, or if I should do something
> different:
>
>> design <- model.matrix(~0+ factor(targets$Sample_Group))
>> colnames(design) <- levels(factor(targets$Sample_Group))
>> design <- cbind(design,pop=as.numeric(targets$pop==1))
>> design
> Cont.24 Cont.8 DHT.24 DHT.8 pop
> 1 0 1 0 0 1
> 2 0 1 0 0 0
> 3 0 1 0 0 0
> 4 0 1 0 0 1
> 5 0 1 0 0 1
> 6 0 1 0 0 0
> 7 0 0 0 1 1
> 8 0 0 0 1 0
> 9 0 0 0 1 0
> 10 0 0 0 1 1
> 11 0 0 0 1 0
> 12 0 0 0 1 1
> 13 1 0 0 0 1
> 14 1 0 0 0 0
> 15 1 0 0 0 1
> 16 1 0 0 0 0
> 17 1 0 0 0 0
> 18 1 0 0 0 1
> 19 0 0 1 0 1
> 20 0 0 1 0 0
> 21 0 0 1 0 1
> 22 0 0 1 0 0
> 23 0 0 1 0 0
> 24 0 0 1 0 1
>
>> duplicateCorrelation(BSlimma.neqc, design,
> block=as.numeric(factor(targets$Sentrix_ID)))$consensus
> [1] 0.1528686
>
>> fit.neqc <- lmFit(BSlimma.neqc, design,
> block=as.numeric(factor(targets$Sentrix_ID)), correlation=0.1528686)
>
>> cont.matrix <- makeContrasts(DHT.8_v_Cont.8 = DHT.8 - Cont.8,
> DHT.24_v_Cont.24 = DHT.24 - Cont.24,
> + Cont.24_v_Cont.8 = Cont.24 - Cont.8,
> DHT.24_v_DHT.8 = DHT.24 - DHT.8,
> + interact = (DHT.24 - Cont.24) - (DHT.8 -
> Cont.8),levels=design)
>
>> fit.neqc2 <- eBayes(contrasts.fit(fit.neqc,cont.matrix))
>
> # This is what I _think_ I should do to remove both batch effects from the
> sample data for heatmaps:
>
>> noChip.values <- removeBatchEffect(BSlimm.neqc$E,
> batch=as.numeric(factor(targets$Sentrix_ID)), design=design)
>
>> noChip.noPop.values <- removeBatchEffect(noChip.values, batch=targets$pop,
> design = design[,1:4] )
>
> Then I can use noChip.noPop.values for heatmaps?
>
> Thanks,
> Jenny
>
>> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252 LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines grid stats graphics grDevices datasets utils
> methods base
>
> other attached packages:
> [1] statmod_1.4.9 beadarray_2.2.0 limma_3.8.2 WGCNA_1.00
> Hmisc_3.8-3
> [6] survival_2.36-5 qvalue_1.26.0 flashClust_1.01
> dynamicTreeCut_1.21 impute_1.26.0
> [11] made4_1.26.0 scatterplot3d_0.3-33 gplots_2.8.0 caTools_1.11
> bitops_1.0-4.1
> [16] gdata_2.8.1 gtools_2.6.2 RColorBrewer_1.0-2
> ade4_1.4-17 affyPLM_1.28.5
> [21] preprocessCore_1.14.0 gcrma_2.24.1 affycoretools_1.24.0
> KEGG.db_2.5.0 GO.db_2.5.0
> [26] RSQLite_0.9-4 DBI_0.2-5 AnnotationDbi_1.14.1
> affy_1.30.0 Biobase_2.12.1
> [31] RWinEdt_1.8-2
>
> loaded via a namespace (and not attached):
> [1] affyio_1.20.0 annaffy_1.24.0 annotate_1.30.0 biomaRt_2.8.1
> Biostrings_2.20.1 Category_2.18.0 cluster_1.13.3
> [8] genefilter_1.34.0 GOstats_2.18.0 graph_1.30.0 GSEABase_1.14.0
> IRanges_1.10.4 lattice_0.19-23 RBGL_1.28.0
> [15] RCurl_1.6-6.1 tcltk_2.13.0 tools_2.13.0 XML_3.4-0.2
> xtable_1.5-6
>
>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list