[Bioc-devel] Use of confounders in downstream analysis

Sean Davis seandavi at gmail.com
Tue Apr 21 19:11:30 CEST 2015

Hi, Aileen.

This list isn't really the best place to ask questions like this and is
really reserved for discussion around package development.  Could you
please post to:


That way, you benefit from more eyes and everyone benefits from potential


On Tue, Apr 21, 2015 at 12:31 PM, Aileen Bahl <aileen.bahl at helsinki.fi>

> Dear all,
> I have some problems in understanding how exactly to include confounders
> in my downstream analysis. I will provide a short description of my
> analysis and problem and I would be very happy if some of you could help me
> understanding how exactly to go ahead with that:
> I normalized 450k data and then used lmFit() to find differentially
> methylated CpGs. My design matrix looks like this:
> model.matrix(~Pair+FatPercentage+EstradiolLevel). So, basically I want to
> identify CpG sites that are associated with changes in estradiol levels. As
> I want to perform within-pair analysis of monozygotic twins I added pair
> information looking like c(1,1,2,3,2,3...). I also added the fat percentage
> as a confounder as we saw significant correlations with the first principal
> component of the data. Does this look right to you?
> Now, after having identified significantly differentially methylated CpGs,
> we want to use the GSA package and look at correlations between methylation
> and expression data. For GSA the pairs can be specified directly in the
> function call. Does that also work with continuous traits or only if you
> have to groups? Additionally, I am not really sure how to include
> confounders then. Do I have to use adjusted or unadjusted data? If I use
> adjusted data, would I use the same design matrix as above and not include
> pair information in the function call? Would that be still a within-pair
> comparison then? And for the adjustment itself, would it be something like
> adj.m <- normalizedM-fit$coef[,-1]%*%t(myDesign[,-1]) or do I also have to
> include the columns for pair and fat percentage in this adjustment somehow?
> If I don't have to use unadjusted data, how would I include information on
> fat percentage and the estradiol levels then?
> Similarly, for the correlations between methylation and expression... Do I
> just use the adjusted data sets and then compute correlations over all
> individuals? Is that then still considering the within-pair changes? Or
> would I use delta betas for correlation analysis? In the latter case, would
> I use adjusted data? Would that then be like adjusting for pair twice if I
> use the design matrix from above? Or would I have to change the matrix and
> if yes, how?
> One last thing - say I wanted to perform differential analysis between two
> groups (not within-pair) but still have some twin pairs included in the
> analysis, would I then used duplicateCorrelation() instead of including the
> pair information directly in the design matrix? Or if that's not the right
> way to go, what should I do in that case?
> Sorry for that many questions! However, I would really appreciate any kind
> of help or ideas, to be able to understand how to go on...
> Thanks a lot in advance and best regards,
> Aileen
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

	[[alternative HTML version deleted]]

More information about the Bioc-devel mailing list