[BioC] ComBat: 2 adjustment variables & continuous adjustment variables

Magda Price magdaprice at gmail.com
Wed Mar 19 23:04:47 CET 2014


Johnson, William Evan <wej at ...> writes:

> 
> Hey Magda, 
> 
> The two-step method is still a reasonable approach. It has worked well for 
me in multiple situations. I do
> have a beta version of a ComBat version that will handle two batch 
variables at the same time. It works well
> in theory--but I have yet to test it thoroughly across multiple datasets. 
I'm willing to share the code if
> you want to test it on your data (let me know).
> 
> ComBat in the sva package can handle numeric covariates, but it does not 
deal with continuous batch
> variables. Adjusting the mean of a continuous batch variable would be 
straight-forward (assuming a
> linear effect), but the variance adjustment would be very tricky. 
> 
> Ultimately, since the two-step approach seems to have worked, I think your 
best option is to just move
> forward with those results. 
> 
> Thanks!
> 
> Evan
> 
> On Feb 19, 2014, at 4:00 AM, <bioconductor-request at ...>
>  <bioconductor-request at ...> wrote:
> 
> > Message: 23
> > Date: Tue, 18 Feb 2014 16:45:12 -0800
> > From: Magda Price <magdaprice at ...>
> > To: "bioconductor at ..." <bioconductor at ...>
> > Subject: [BioC] ComBat: 2 adjustment variables & continuous adjustment
> > 	variables
> > Message-ID:
> > 	<CADkR4V=ydd1abJXFhtd+Xwq8MZMP_=urHVDtPXOTurPQjzB7Tg at ...>
> > Content-Type: text/plain
> > 
> > Hi!
> > 
> > I'm writing with a few questions about applying ComBat (sva package) to 
a
> > set of ~50 samples run on the the Illumina Infinium HumanMethylation450
> > BeadChip array (~450,000 DNA methylation data points).
> > 
> > There is a large amount of variation in my data due to both the batch 
the
> > samples were run in (3 different batches), in addition to the position 
they
> > were located on the chip - specifically the row (6 different rows), but 
not
> > the column. The chips are set up in a 6 row * 2 column format like this:
> > 
> > 
> > sample 01   sample 02
> > sample 03   sample 04
> > sample 05   sample 06
> > sample 07   sample 08
> > sample 09   sample 10
> > sample 11   sample 12
> > 
> > 
> > I read Dr. Evan Johnson's suggestions to someone else with this
> > "2-batch-effect-variable" problem in the ComBat google group (
> > https://groups.google.com/forum/#!topic/combat-user-forum/PcTxNlaUmAI). 
He
> > had 2 good suggestions:
> > 
> >   1. Combine the two batch variables into one, if 3-4 reps are left in
> >   each batch
> >   2. Use ComBat twice, adjusting for the first batch using the second
> >   batch as a covariate, and then adjust for the second batch.
> > 
> > I cannot go with the first suggestion because combining the 2 batch
> > variables would create 18 batch categories (3 batches * 6 rows), and I
> > would not have enough replicates per batch category.
> > 
> > So I tried the second option - applying ComBat twice. I first corrected 
for
> > row and then took the row-corrected data and applied ComBat again,
> > correcting for batch. It seems to have worked & the correlation of my
> > technical replicates improves. I am seeking advice on two points:
> > 
> >   1. The google group post is now a few years old, is it still thought
> >   that the step-wise correction is a valid approach?
> >   2. Row would be better treated as a continuous adjustment variable 
than
> >   a factor. In the version of sva that I am using (3.0.2) I believe that 
only
> >   factors adjustment variables are supported. I have seen mention in a 
few
> >   forums that there might be an update to ComBat to adjust for a numeric
> >   batch variable, is one available?
> > 
> > Thank you in advanced for your help!
> > 
> > Magda Price,
> > University of British Columbia
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 

Hi Evan,

Thanks for your response & so sorry for my delay, I wasn't notified by e-
mail that you had responded.

Since I wrote the first note, a few things have changed:

1. I have discovered a third batch variable (chip, in addition to plate & 
row)!
2. Based on forum feedback, I figured I was okay to stick with all factor 
adjustment variables.

An additional question that has come up in reference to your suggestion:
> >   Use ComBat twice, adjusting for the first batch using the second
> >   batch as a covariate, and then adjust for the second batch.
I can't include the second batch variable as a covariate; I get a 
singularity error because the batches are confounded with each other. For 
example, all samples on chip A were run in batch 1. Do you still think it a 
valid approach if I can't use subsequent batch variables as covariates?

Thank you for the offer & advice!

Magda



More information about the Bioconductor mailing list