[Bioc-devel] Use of confounders in downstream analysis

Aileen Bahl aileen.bahl at helsinki.fi
Tue Apr 21 21:19:16 CEST 2015

Thanks and sorry,

I didn't get a lot of response at the Bioconductor support site and  
thus tried it here. However, good to know where would be the best  


Zitat von Sean Davis <seandavi at gmail.com>:

> Hi, Aileen.
> This list isn't really the best place to ask questions like this and is
> really reserved for discussion around package development.  Could you
> please post to:
> https://support.bioconductor.org/
> That way, you benefit from more eyes and everyone benefits from potential
> answers.
> Thanks,
> Sean
> On Tue, Apr 21, 2015 at 12:31 PM, Aileen Bahl <aileen.bahl at helsinki.fi>
> wrote:
>> Dear all,
>> I have some problems in understanding how exactly to include confounders
>> in my downstream analysis. I will provide a short description of my
>> analysis and problem and I would be very happy if some of you could help me
>> understanding how exactly to go ahead with that:
>> I normalized 450k data and then used lmFit() to find differentially
>> methylated CpGs. My design matrix looks like this:
>> model.matrix(~Pair+FatPercentage+EstradiolLevel). So, basically I want to
>> identify CpG sites that are associated with changes in estradiol levels. As
>> I want to perform within-pair analysis of monozygotic twins I added pair
>> information looking like c(1,1,2,3,2,3...). I also added the fat percentage
>> as a confounder as we saw significant correlations with the first principal
>> component of the data. Does this look right to you?
>> Now, after having identified significantly differentially methylated CpGs,
>> we want to use the GSA package and look at correlations between methylation
>> and expression data. For GSA the pairs can be specified directly in the
>> function call. Does that also work with continuous traits or only if you
>> have to groups? Additionally, I am not really sure how to include
>> confounders then. Do I have to use adjusted or unadjusted data? If I use
>> adjusted data, would I use the same design matrix as above and not include
>> pair information in the function call? Would that be still a within-pair
>> comparison then? And for the adjustment itself, would it be something like
>> adj.m <- normalizedM-fit$coef[,-1]%*%t(myDesign[,-1]) or do I also have to
>> include the columns for pair and fat percentage in this adjustment somehow?
>> If I don't have to use unadjusted data, how would I include information on
>> fat percentage and the estradiol levels then?
>> Similarly, for the correlations between methylation and expression... Do I
>> just use the adjusted data sets and then compute correlations over all
>> individuals? Is that then still considering the within-pair changes? Or
>> would I use delta betas for correlation analysis? In the latter case, would
>> I use adjusted data? Would that then be like adjusting for pair twice if I
>> use the design matrix from above? Or would I have to change the matrix and
>> if yes, how?
>> One last thing - say I wanted to perform differential analysis between two
>> groups (not within-pair) but still have some twin pairs included in the
>> analysis, would I then used duplicateCorrelation() instead of including the
>> pair information directly in the design matrix? Or if that's not the right
>> way to go, what should I do in that case?
>> Sorry for that many questions! However, I would really appreciate any kind
>> of help or ideas, to be able to understand how to go on...
>> Thanks a lot in advance and best regards,
>> Aileen
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel

More information about the Bioc-devel mailing list