[Bioc-devel] Use of confounders in downstream analysis
Aileen Bahl
aileen.bahl at helsinki.fi
Tue Apr 21 18:31:17 CEST 2015
Dear all,
I have some problems in understanding how exactly to include
confounders in my downstream analysis. I will provide a short
description of my analysis and problem and I would be very happy if
some of you could help me understanding how exactly to go ahead with
that:
I normalized 450k data and then used lmFit() to find differentially
methylated CpGs. My design matrix looks like this:
model.matrix(~Pair+FatPercentage+EstradiolLevel). So, basically I want
to identify CpG sites that are associated with changes in estradiol
levels. As I want to perform within-pair analysis of monozygotic twins
I added pair information looking like c(1,1,2,3,2,3...). I also added
the fat percentage as a confounder as we saw significant correlations
with the first principal component of the data. Does this look right
to you?
Now, after having identified significantly differentially methylated
CpGs, we want to use the GSA package and look at correlations between
methylation and expression data. For GSA the pairs can be specified
directly in the function call. Does that also work with continuous
traits or only if you have to groups? Additionally, I am not really
sure how to include confounders then. Do I have to use adjusted or
unadjusted data? If I use adjusted data, would I use the same design
matrix as above and not include pair information in the function call?
Would that be still a within-pair comparison then? And for the
adjustment itself, would it be something like adj.m <-
normalizedM-fit$coef[,-1]%*%t(myDesign[,-1]) or do I also have to
include the columns for pair and fat percentage in this adjustment
somehow? If I don't have to use unadjusted data, how would I include
information on fat percentage and the estradiol levels then?
Similarly, for the correlations between methylation and expression...
Do I just use the adjusted data sets and then compute correlations
over all individuals? Is that then still considering the within-pair
changes? Or would I use delta betas for correlation analysis? In the
latter case, would I use adjusted data? Would that then be like
adjusting for pair twice if I use the design matrix from above? Or
would I have to change the matrix and if yes, how?
One last thing - say I wanted to perform differential analysis between
two groups (not within-pair) but still have some twin pairs included
in the analysis, would I then used duplicateCorrelation() instead of
including the pair information directly in the design matrix? Or if
that's not the right way to go, what should I do in that case?
Sorry for that many questions! However, I would really appreciate any
kind of help or ideas, to be able to understand how to go on...
Thanks a lot in advance and best regards,
Aileen
More information about the Bioc-devel
mailing list