[BioC] sva: how to incorporate adjusting variables
Meritxell Oliva [guest]
guest at bioconductor.org
Wed Apr 10 13:53:47 CEST 2013
Dear Jeffrey,
I am using sva to estimate potential surrogate variables of a microarray derived expression dataset, as a previous step to perform differential gene expression analysis. The aim of my work is to study how one multifactorial variable ( inversion genotype, three categories -> STD,HET,INV ) affects the gene expression profile of a set of human individuals. However, there are some other variables ( population, gender ) with a partial effect, that is, they account for variation in the expression of a subset of genes. I don't know how to deal with these variables. Which of the following options is the most appropriate one (if any) ?
A) "Protect" them by their inclusion in the both the null and and full model
mod0 = model.matrix(~as.factor(Gender)+as.factor(Population), data=pheno)
mod = model.matrix(~as.factor(inversion_genotype)+as.factor(Gender)+as.factor(Population), data=pheno)
svobj = sva(edata,mod,mod0)
B) Include them only in the full model
mod0 = model.matrix(~1, data=pheno)
mod = model.matrix(~as.factor(inversion_genotype)+as.factor(Gender)+as.factor(Population)+, data=pheno)
svobj = sva(edata,mod,mod0)
C) Not include them at all ( and expect to get some surrogate variables with strong correlation with these variables, in case they really affect gene expression )
mod0 = model.matrix(~1, data=pheno)
mod = model.matrix(~as.factor(inversion_genotype), data=pheno)
svobj = sva(edata,mod,mod0)
To summarize: how should adjustment variables with global effect be treated? how should adjustment variables with partial effect ( only in a subset of genes ) be treated?
I would really appreciate any piece of advice.
Thanks a lot!
Meri
-- output of sessionInfo():
R version 2.15.2 (2012-10-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list