[BioC] sequential removeBatchEffect()?
Jenny Drnevich
drnevich at illinois.edu
Thu Jun 30 15:25:24 CEST 2011
HI Gordon,
No, Sample_Group isn't confounded with BeadChip. Our microarray unit
is very good about asking customers about the treatment groups and
randomizing them on arrays. I use GenomeStudio to output summarized
beadtype values (no bg or norm), and GenomeStudio always re-organizes
the by Sample_Group. I've learned to double-check that the order in
my targets file is the same as the order coming out of GenomeStudio!
Jenny
At 06:08 PM 6/29/2011, Gordon K Smyth wrote:
>Hi Jenny,
>
>Perhaps I'm mis-understanding your setup, but isn't Sample_Group
>confounded with BeadChip? If the samples are entered in BeadChip
>order, then the first six would be BeadChip 1 (and also Cont.8), the
>second six would be BeadChip 2 (and also DHT.8), etc. Or have the
>samples been reordered by Sample_Group?
>
>Best wishes
>Gordon
>
>---------------------------------------------
>Professor Gordon K Smyth,
>Bioinformatics Division,
>Walter and Eliza Hall Institute of Medical Research,
>1G Royal Parade, Parkville, Vic 3052, Australia.
>smyth at wehi.edu.au
>http://www.wehi.edu.au
>http://www.statsci.org/smyth
>
>On Wed, 29 Jun 2011, Jenny Drnevich wrote:
>
>>Hi Gordon and everyone,
>>
>>I was wondering what is the best way to remove two different batch
>>effects from a set of sample values for plotting heatmaps? I have
>>data from Illumina's MouseWG6 arrays, 24 samples on 4 BeadChips.
>>There is the effect of BeadChip, which I'm accounting for in the
>>model using duplicateCorrelation(), and also population batch
>>effect, which I have as a term in the design matrix. I'd like to
>>remove both of these effects from the individual sample data before
>>making heatmaps of significant genes. Should I use
>>removeBatchEffect() twice, first to remove the BeadChip effect
>>(with population in the design) and then remove the population
>>effect? Here's an example of my analysis code and what I'm thinking
>>of doing to remove the two effect. Please let me know if this is
>>alright, or if I should do something different:
>>
>>>design <- model.matrix(~0+ factor(targets$Sample_Group))
>>>colnames(design) <- levels(factor(targets$Sample_Group))
>>>design <- cbind(design,pop=as.numeric(targets$pop==1))
>>>design
>> Cont.24 Cont.8 DHT.24 DHT.8 pop
>>1 0 1 0 0 1
>>2 0 1 0 0 0
>>3 0 1 0 0 0
>>4 0 1 0 0 1
>>5 0 1 0 0 1
>>6 0 1 0 0 0
>>7 0 0 0 1 1
>>8 0 0 0 1 0
>>9 0 0 0 1 0
>>10 0 0 0 1 1
>>11 0 0 0 1 0
>>12 0 0 0 1 1
>>13 1 0 0 0 1
>>14 1 0 0 0 0
>>15 1 0 0 0 1
>>16 1 0 0 0 0
>>17 1 0 0 0 0
>>18 1 0 0 0 1
>>19 0 0 1 0 1
>>20 0 0 1 0 0
>>21 0 0 1 0 1
>>22 0 0 1 0 0
>>23 0 0 1 0 0
>>24 0 0 1 0 1
>>
>>>duplicateCorrelation(BSlimma.neqc, design,
>>block=as.numeric(factor(targets$Sentrix_ID)))$consensus
>>[1] 0.1528686
>>
>>>fit.neqc <- lmFit(BSlimma.neqc, design,
>>block=as.numeric(factor(targets$Sentrix_ID)), correlation=0.1528686)
>>
>>>cont.matrix <- makeContrasts(DHT.8_v_Cont.8 = DHT.8 - Cont.8,
>>DHT.24_v_Cont.24 = DHT.24 - Cont.24,
>>+ Cont.24_v_Cont.8 = Cont.24 - Cont.8,
>>DHT.24_v_DHT.8 = DHT.24 - DHT.8,
>>+ interact = (DHT.24 - Cont.24) -
>>(DHT.8 - Cont.8),levels=design)
>>
>>>fit.neqc2 <- eBayes(contrasts.fit(fit.neqc,cont.matrix))
>>
>># This is what I _think_ I should do to remove both batch effects
>>from the sample data for heatmaps:
>>
>>>noChip.values <- removeBatchEffect(BSlimm.neqc$E,
>>batch=as.numeric(factor(targets$Sentrix_ID)), design=design)
>>
>>>noChip.noPop.values <- removeBatchEffect(noChip.values, batch=targets$pop,
>>design = design[,1:4] )
>>
>>Then I can use noChip.noPop.values for heatmaps?
>>
>>Thanks,
>>Jenny
>>
>>>sessionInfo()
>>R version 2.13.0 (2011-04-13)
>>Platform: x86_64-pc-mingw32/x64 (64-bit)
>>
>>locale:
>>[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>States.1252 LC_MONETARY=English_United States.1252
>>[4] LC_NUMERIC=C LC_TIME=English_United States.1252
>>
>>attached base packages:
>>[1] splines grid stats graphics grDevices
>>datasets utils methods base
>>
>>other attached packages:
>>[1] statmod_1.4.9 beadarray_2.2.0 limma_3.8.2
>>WGCNA_1.00 Hmisc_3.8-3
>>[6] survival_2.36-5 qvalue_1.26.0 flashClust_1.01
>>dynamicTreeCut_1.21 impute_1.26.0
>>[11] made4_1.26.0 scatterplot3d_0.3-33 gplots_2.8.0
>>caTools_1.11 bitops_1.0-4.1
>>[16] gdata_2.8.1 gtools_2.6.2 RColorBrewer_1.0-2
>>ade4_1.4-17 affyPLM_1.28.5
>>[21] preprocessCore_1.14.0
>>gcrma_2.24.1 affycoretools_1.24.0 KEGG.db_2.5.0 GO.db_2.5.0
>>[26]
>>RSQLite_0.9-4 DBI_0.2-5 AnnotationDbi_1.14.1
>>affy_1.30.0 Biobase_2.12.1
>>[31] RWinEdt_1.8-2
>>
>>loaded via a namespace (and not attached):
>>[1]
>>affyio_1.20.0 annaffy_1.24.0 annotate_1.30.0 biomaRt_2.8.1
>>Biostrings_2.20.1 Category_2.18.0 cluster_1.13.3
>>[8] genefilter_1.34.0
>>GOstats_2.18.0 graph_1.30.0 GSEABase_1.14.0
>>IRanges_1.10.4 lattice_0.19-23 RBGL_1.28.0
>>[15]
>>RCurl_1.6-6.1 tcltk_2.13.0 tools_2.13.0 XML_3.4-0.2 xtable_1.5-6
>>
>>
>
>______________________________________________________________________
>The information in this email is confidential and inten...{{dropped:6}}
More information about the Bioconductor
mailing list