[Bioc-devel] Use of confounders in downstream analysis

Tue Apr 21 18:31:17 CEST 2015

Dear all,

I have some problems in understanding how exactly to include  
confounders in my downstream analysis. I will provide a short  
description of my analysis and problem and I would be very happy if  
some of you could help me understanding how exactly to go ahead with  
that:

I normalized 450k data and then used lmFit() to find differentially  
methylated CpGs. My design matrix looks like this:  
model.matrix(~Pair+FatPercentage+EstradiolLevel). So, basically I want  
to identify CpG sites that are associated with changes in estradiol  
levels. As I want to perform within-pair analysis of monozygotic twins  
I added pair information looking like c(1,1,2,3,2,3...). I also added  
the fat percentage as a confounder as we saw significant correlations  
with the first principal component of the data. Does this look right  
to you?

Now, after having identified significantly differentially methylated  
CpGs, we want to use the GSA package and look at correlations between  
methylation and expression data. For GSA the pairs can be specified  
directly in the function call. Does that also work with continuous  
traits or only if you have to groups? Additionally, I am not really  
sure how to include confounders then. Do I have to use adjusted or  
unadjusted data? If I use adjusted data, would I use the same design  
matrix as above and not include pair information in the function call?  
Would that be still a within-pair comparison then? And for the  
adjustment itself, would it be something like adj.m <-  
normalizedM-fit$coef[,-1]%*%t(myDesign[,-1]) or do I also have to  
include the columns for pair and fat percentage in this adjustment  
somehow? If I don't have to use unadjusted data, how would I include  
information on fat percentage and the estradiol levels then?

Similarly, for the correlations between methylation and expression...  
Do I just use the adjusted data sets and then compute correlations  
over all individuals? Is that then still considering the within-pair  
changes? Or would I use delta betas for correlation analysis? In the  
latter case, would I use adjusted data? Would that then be like  
adjusting for pair twice if I use the design matrix from above? Or  
would I have to change the matrix and if yes, how?

One last thing - say I wanted to perform differential analysis between  
two groups (not within-pair) but still have some twin pairs included  
in the analysis, would I then used duplicateCorrelation() instead of  
including the pair information directly in the design matrix? Or if  
that's not the right way to go, what should I do in that case?

Sorry for that many questions! However, I would really appreciate any  
kind of help or ideas, to be able to understand how to go on...

Thanks a lot in advance and best regards,

Aileen