Hi Tiger,
It is completely understandable with your study design that you will get completely different results with different covariates and that in some cases you will run into singularity issues. I can see your your difficulties stemming from the complex experimental design of your study. However, please note that your question of inclusion/exclusion of covariates is really a statistical/modeling question and not really a ComBat question. At this time, I would strongly recommend that you sit down with a statistician at your institution, discuss with him/her your study design and research questions/goals, and design a linear model (including batch) that will help you accomplish this goal. Once you have this, as long as none of your covariates of interest are confounded with batch, you can run your selected linear model through ComBat to remove the batch effects as you desire (using all the non-batch variables in your linear model as 'covariates').
Hope this helps,
Evan
On Feb 28, 2013, at 2:36 PM, hu duan wrote:
Hi Dr. Johnson,
Thanks for replying my previous Email. I have tried different combination of covariates, they all gave different result and some failed due to singularity. I still did not know how to choose the right covariates. I will try to explain my question clear this time. Please see the pdf attached for the detail analysis, including PCA.
The below figure shows all possible covariates I could choose. You can understand group, stage, window are more and more detailed classifications of all samples. They are dependent, not like age, treatment group that are independent in other experiment.
Each sample has biological difference with others, not like technical replicates. I want to keep all biological variance of samples and eliminate batch effects. How should I choose covariates?
You mentioned in forum before that "Using that ComBat can estimate what is really variance between the samples and what is due to the batches"
In above figure, how can ComBat not eliminate real sample variance in the same subset determined by covariates and only remove batch effect? Do you assume samples in same subset will follow same distribution?
Can ComBat give a number to show how many percent of variance due to bach effect and how many left after ComBat?
Thank you very much
Best
Tiger
The bell plot can show batch removed result. But is there a good way to give a quantitative parameter, So we can know how ComBat perform and can convince reviewers?
2013/2/21 Johnson, William Evan >
Hey Tiger, See below:
On Feb 19, 2013, at 2:16 PM, hu duan wrote:
Hi Dr. Johnson,
I have one question about multiple covariates in ComBat analysis that bother me for two weeks and I really need your advice.
I have a microarray experiment design list below and please look at the attachment for details.
1. I ran experiments in 4 different dates in a randomized way, so 4 batches.
2. Each sample has been run with 2 or 3 replicates.
3. 4 parameters(individual, age, status, stage) associate with each sample. None of them is nuisance. Batch is ONLY the effect I want to eliminate.
Analysis I want to do:
I want to perform statistic test to find significant features between groups by status first and then see how they behave on different Individual, stages and ages. I will later select features to distinguish different stages.
So I need a method to consider individual, age, status, stage at same times when doing ComBat analysis, so that their biological difference will not be eliminated.
My question is:
What covariates do I need to include?
You need to include them all as covariates. Make sure to include age as a numerical covariate.
What will be the influence without including them?
If your experimental design is balanced across batches, you may not see much of a difference. If there is some unbalance on your design (say there are a higher proportion of treatments in on batch vs another, or if the patients in one batch are younger than in another batch) then you will be removing biological variation with the batch variation. Adding covariates will ensure that the biological variation is untouched by the batch adjustment.
Can you mention the principle of choosing multiple covariates?
No completely sure what you mean here. Again, you include any covariates of interest so that the biological signal is not removed during the batch adjustment. As far as which covariates to include: anything you variables which you don't want removed from the data.
Hope this answers your questions.
Thanks
Tiger
PS:
The bell plot can show batch removed result. But is there a good way to give a quantitative parameter, such as a percent to show how many batch effects have been removed? So we can know how ComBat perform and can convince reviewers?
--
Hu Duan (Tiger)
Biological Design PhD student
Graduate Research Associate
Center for Innovation in Medicine
The Biodesign Institute, Arizona State University
---------------------------------------------------------------------------------------
"MY MIND REBELS AT STAGNATION." -- Sherlock Holmes
---------------------------------------------------------------------------------------
--
Hu Duan (Tiger)
Biological Design PhD student
Graduate Research Associate
Center for Innovation in Medicine
The Biodesign Institute, Arizona State University
---------------------------------------------------------------------------------------
"MY MIND REBELS AT STAGNATION." -- Sherlock Holmes
---------------------------------------------------------------------------------------
--
Hu Duan (Tiger)
Biological Design PhD student
Graduate Research Associate
Center for Innovation in Medicine
The Biodesign Institute, Arizona State University
---------------------------------------------------------------------------------------
"MY MIND REBELS AT STAGNATION." -- Sherlock Holmes
---------------------------------------------------------------------------------------
<20130228_ComBat Covariate_Tiger.pdf>
[[alternative HTML version deleted]]