[BioC] Covariate for batch effect removal by ComBat
Johnson, William Evan
wej at bu.edu
Tue Jun 11 15:02:38 CEST 2013
Atul,
The way your design looks, it seems that your experimental conditions are confounded with batch. At this point, you will need to make some assumptions to get ComBat working correctly. What should be the difference between P4 and P8? Would P20 and P30 be that different? What about P42 and P52? Can you assume any of these to be the same?
Note, I'm happy to discuss this off the mailing list if you don't want to tell everyone your experimental conditions, but because your design is confounded, you really need to think carefully about how you apply ComBat and in what assumptions you make.
Thanks!
Evan
On Jun 11, 2013, at 6:00 AM, <bioconductor-request at r-project.org>
<bioconductor-request at r-project.org> wrote:
> Message: 12
> Date: Mon, 10 Jun 2013 13:42:52 -0400
> From: Atul Kakrana <atulkakrana at outlook.com>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] Covariate for batch effect removal by ComBat
> Message-ID: <BLU0-SMTP108CD993F824AF0373B7432AD840 at phx.gbl>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi Everybody,
>
> I am analysing Illumina micro-array data and seem to have batch effects
> (plots attached) in my data. For batch effect removal I am using Combat
> from 'sva' package. This is my sample info file:
>
> Array.name Sample Stage Condition Batch
> P4_A 1 P4 Test 1
> P4_B 2 P4 Test 1
> P4_C 3 P4 Test 1
> P30_A 4 P30 Test 1
> P30_B 5 P30 Test 1
> P12_A 6 P12 Test 2
> P12_B 7 P12 Test 2
> P52_A 8 P52 Test 2
> P52_B 9 P52 Test 2
> CON_A 10 Mix Con 2
> CON_B 11 Mix Con 2
> P8_A 12 P8 Test 2
> P8_B 13 P8 Test 2
> P20_A 14 P20 Test 2
> P20_B 15 P20 Test 2
> P42_A 16 P42 Test 2
> P42_B 17 P42 Test 2
>
>
> The data is from a time-series experiment and numbers in 'Array.name'
> correspond to age at which samples harvested. None of the time point is
> repeated again in any of the batch For ex. P4 is in batch 1 and never
> repeated again. I have few questions about implementation of ComBat.
>
> 1. Which column should be used for co-variates. I am confused between
> 'Stage' and 'Condition'? Or should I use 'Condition' as covariates and
> 'Stage' as Continuous variables (numCovs)?
>
> 2. The adjustment, parametric or non-parametric?
>
> Here is my Code:
>
> IL.pheno <- read.table('PhenoData.csv', sep =',', header = T)##
> PhenoData is same as sample info above
> batch = IL.pheno$Batch
> edata <- exprs(esetLumi.Reduced.AB)
> mod = model.matrix(~as.factor(Condition), data=IL.pheno)
> combat_edata = ComBat(dat=edata, batch=batch, mod=mod, numCovs=NULL,
> par.prior=TRUE, prior.plots = TRUE)
>
> ##Fitting back to expression set
> exprs(esetLumi.Reduced.AB) <- combat_edata
>
>
> I appreciate your help.
>
> Best
>
> AK
More information about the Bioconductor
mailing list