[R] Correcting for covariate (unbalanced design)

Renaud Gaujoux renaud at mancala.cbio.uct.ac.za
Tue Nov 11 15:51:00 CET 2008


Hi,

I've got a microarray dataset (Illumina) coming from a blood assay with 
a case-control factor of interest.
I also have several other covariates (gender, weight, etc...).

I know that the experimental design is highly unbalanced with respect to 
Gender:

                female male
  control     12        7
  case         7        17

Therefore, if there is a Gender effect, then it really needs to be 
included into any subsequent analysis (differential expression with 
limma, classifications). I do not want to find differences between 
cases-controls that are actually due to Gender.

Some questions around that:
- what would be the "best practice" way of find if the Gender (or any 
other covariates) actually has an effect that needs to be dealt with (as 
I would rather not bother about it).
What I did: run limma on ~ Status + Gender, looking at the p-values for 
Gender (?)

- obviously one part of the genes claim for a Gender effect, the other 
part doesn't. In that case is it a good thing to include the Gender for 
all? Is it right to use two different models? What about the multiple 
testing correction in that case?

- supposing we decide to take into account the gender in the analysis, 
do you know classification methods that enables to include some 
correction for a covariate (I cannot correct my original data for gender 
without including the case-control status, because I think would then 
remove a lot of the effect of interest (cf. unbalanced design). 
Therefore, I need to cross-validated any gender-correction if I do not 
want to bias the classification result. This increase the complexity of 
the classification methods, as well as reducing the actual choice of the 
method, since not all method give access to the internal machinery (cf. 
Random Forest: can I hook the splitting method to use a gender-corrected 
split?)

- any other suggestion to deal with this kind of very annoying 
unbalanced design?

A lot of stuff hey?
Thanks for your help and comments.



More information about the R-help mailing list