[BioC] sva: how to incorporate adjusting variables

Wed Apr 10 13:53:47 CEST 2013

Dear Jeffrey,

I am using sva to estimate potential surrogate variables of a microarray derived expression dataset, as a previous step to perform differential gene expression analysis. The aim of my work is to study how one multifactorial variable  ( inversion genotype, three categories -> STD,HET,INV ) affects the gene expression profile of a set of human individuals. However, there are some other variables ( population, gender ) with a partial effect, that is, they account for variation in the expression of a subset of genes. I don't know how to deal with these variables. Which of the following options is the most appropriate one (if any) ?

A) "Protect" them by their inclusion in the both the null and and full model

mod0 = model.matrix(~as.factor(Gender)+as.factor(Population), data=pheno)
mod = model.matrix(~as.factor(inversion_genotype)+as.factor(Gender)+as.factor(Population), data=pheno)
svobj = sva(edata,mod,mod0)

B) Include them only in the full model

mod0 = model.matrix(~1, data=pheno)
mod = model.matrix(~as.factor(inversion_genotype)+as.factor(Gender)+as.factor(Population)+, data=pheno)
svobj = sva(edata,mod,mod0)

C) Not include them at all ( and expect to get some surrogate variables with strong correlation with these variables, in case they really affect gene expression  )

mod0 = model.matrix(~1, data=pheno)
mod = model.matrix(~as.factor(inversion_genotype), data=pheno)
svobj = sva(edata,mod,mod0)

To summarize: how should adjustment variables with global effect be treated? how should adjustment variables with partial effect ( only in a subset of genes ) be treated?

I would really appreciate any piece of advice.

Thanks a lot!

Meri

 -- output of sessionInfo(): 

R version 2.15.2 (2012-10-26)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

--
Sent via the guest posting facility at bioconductor.org.