[BioC] Regression analysis problem

Fri Oct 7 19:26:59 CEST 2011

To follow up on Tim's point:

I have yet to see any evidence that methylation data is anything other 
than ternary. The typical interpretation for the beta values is
     less than 0.25 = fully unmethylated
     greater than 0.75 = fully methylated
     between 0.25 and 0.75 = partially methylated
Part of the evidence for this assertion is that we ahve several sets of 
data from samples treated with drugs that should fully methylate (or 
fully demethylate, respectively) everything. On those samples, 99% of 
the observed values are above 0.75 (or below 0.25, respectively).

So I'm not convinced that t-tests have role to play in analyzing 
genome-wide methylation data....

     Kevin

On 10/7/2011 10:39 AM, Tim Triche, Jr. wrote:
> On Fri, Oct 7, 2011 at 8:15 AM, James W. MacDonald<jmacdon at med.umich.edu>wrote:
>
>> First, by increasing the number of genes, you can more accurately
>> estimate an overall variance, which is then used in the eBayes() step to
>> 'shrink' your observed variance towards this overall variance. This is
>> one of the reasons that limma is so popular - by using information from
>> all genes, you can increase the power to detect differences in
>> individual genes.
>>
> Careful though -- this is methylation data, which tends to be strongly
> bimodal.  It's not clear that the assumption of a common variance across
> unmethylated, partially-methylated, and methylated sites is appropriate.  I
> seem to recall Gordon Smyth commenting upon this at one point -- perhaps
> he'll chime in.
>
>
>> Second, when you increase the number of pairs you are using, your power
>> to detect differences increases as well. This has nothing to do with
>> limma per se; it is just that the power of a t-test increases as you
>> increase the number of observations.
>>
> This is of course appropriate at any time that you can get more high-quality
> samples rather than fewer :-)
>