[BioC] sequential removeBatchEffect()?

Gordon K Smyth smyth at wehi.EDU.AU
Thu Jun 30 01:08:52 CEST 2011


Hi Jenny,

Perhaps I'm mis-understanding your setup, but isn't Sample_Group 
confounded with BeadChip?  If the samples are entered in BeadChip order, 
then the first six would be BeadChip 1 (and also Cont.8), the second six 
would be BeadChip 2 (and also DHT.8), etc.  Or have the samples been 
reordered by Sample_Group?

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth

On Wed, 29 Jun 2011, Jenny Drnevich wrote:

> Hi Gordon and everyone,
>
> I was wondering what is the best way to remove two different batch effects 
> from a set of sample values for plotting heatmaps? I have data from 
> Illumina's MouseWG6 arrays, 24 samples on 4 BeadChips. There is the effect of 
> BeadChip, which I'm accounting for in the model using duplicateCorrelation(), 
> and also population batch effect, which I have as a term in the design 
> matrix. I'd like to remove both of these effects from the individual sample 
> data before making heatmaps of significant genes. Should I use 
> removeBatchEffect() twice, first to remove the BeadChip effect (with 
> population in the design) and then remove the population effect? Here's an 
> example of my analysis code and what I'm thinking of doing to remove the two 
> effect. Please let me know if this is alright, or if I should do something 
> different:
>
>> design <- model.matrix(~0+ factor(targets$Sample_Group))
>> colnames(design) <- levels(factor(targets$Sample_Group))
>> design <- cbind(design,pop=as.numeric(targets$pop==1))
>> design
>   Cont.24 Cont.8 DHT.24 DHT.8 pop
> 1        0      1      0     0   1
> 2        0      1      0     0   0
> 3        0      1      0     0   0
> 4        0      1      0     0   1
> 5        0      1      0     0   1
> 6        0      1      0     0   0
> 7        0      0      0     1   1
> 8        0      0      0     1   0
> 9        0      0      0     1   0
> 10       0      0      0     1   1
> 11       0      0      0     1   0
> 12       0      0      0     1   1
> 13       1      0      0     0   1
> 14       1      0      0     0   0
> 15       1      0      0     0   1
> 16       1      0      0     0   0
> 17       1      0      0     0   0
> 18       1      0      0     0   1
> 19       0      0      1     0   1
> 20       0      0      1     0   0
> 21       0      0      1     0   1
> 22       0      0      1     0   0
> 23       0      0      1     0   0
> 24       0      0      1     0   1
>
>> duplicateCorrelation(BSlimma.neqc, design, 
> block=as.numeric(factor(targets$Sentrix_ID)))$consensus
> [1]  0.1528686
>
>> fit.neqc <- lmFit(BSlimma.neqc, design, 
> block=as.numeric(factor(targets$Sentrix_ID)), correlation=0.1528686)
>
>> cont.matrix <- makeContrasts(DHT.8_v_Cont.8 = DHT.8 - Cont.8, 
> DHT.24_v_Cont.24 = DHT.24 - Cont.24,
> +                             Cont.24_v_Cont.8 = Cont.24 - Cont.8, 
> DHT.24_v_DHT.8 = DHT.24 - DHT.8,
> +                             interact = (DHT.24 - Cont.24) - (DHT.8 - 
> Cont.8),levels=design)
>
>> fit.neqc2 <- eBayes(contrasts.fit(fit.neqc,cont.matrix))
>
> # This is what I _think_ I should do to remove both batch effects from the 
> sample data for heatmaps:
>
>> noChip.values <- removeBatchEffect(BSlimm.neqc$E, 
> batch=as.numeric(factor(targets$Sentrix_ID)), design=design)
>
>> noChip.noPop.values <- removeBatchEffect(noChip.values, batch=targets$pop, 
> design = design[,1:4] )
>
> Then I can use noChip.noPop.values for heatmaps?
>
> Thanks,
> Jenny
>
>> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines   grid      stats     graphics  grDevices datasets  utils 
> methods   base
>
> other attached packages:
> [1] statmod_1.4.9         beadarray_2.2.0       limma_3.8.2 WGCNA_1.00 
> Hmisc_3.8-3
> [6] survival_2.36-5       qvalue_1.26.0         flashClust_1.01 
> dynamicTreeCut_1.21   impute_1.26.0
> [11] made4_1.26.0          scatterplot3d_0.3-33  gplots_2.8.0 caTools_1.11 
> bitops_1.0-4.1
> [16] gdata_2.8.1           gtools_2.6.2          RColorBrewer_1.0-2 
> ade4_1.4-17           affyPLM_1.28.5
> [21] preprocessCore_1.14.0 gcrma_2.24.1          affycoretools_1.24.0 
> KEGG.db_2.5.0 GO.db_2.5.0
> [26] RSQLite_0.9-4         DBI_0.2-5             AnnotationDbi_1.14.1 
> affy_1.30.0           Biobase_2.12.1
> [31] RWinEdt_1.8-2
>
> loaded via a namespace (and not attached):
> [1] affyio_1.20.0     annaffy_1.24.0    annotate_1.30.0   biomaRt_2.8.1 
> Biostrings_2.20.1 Category_2.18.0   cluster_1.13.3
> [8] genefilter_1.34.0 GOstats_2.18.0    graph_1.30.0      GSEABase_1.14.0 
> IRanges_1.10.4   lattice_0.19-23   RBGL_1.28.0
> [15] RCurl_1.6-6.1     tcltk_2.13.0      tools_2.13.0      XML_3.4-0.2 
> xtable_1.5-6
>
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list