[BioC] Regression analysis problem
Kevin R. Coombes
kevin.r.coombes at gmail.com
Fri Oct 7 19:26:59 CEST 2011
To follow up on Tim's point:
I have yet to see any evidence that methylation data is anything other
than ternary. The typical interpretation for the beta values is
less than 0.25 = fully unmethylated
greater than 0.75 = fully methylated
between 0.25 and 0.75 = partially methylated
Part of the evidence for this assertion is that we ahve several sets of
data from samples treated with drugs that should fully methylate (or
fully demethylate, respectively) everything. On those samples, 99% of
the observed values are above 0.75 (or below 0.25, respectively).
So I'm not convinced that t-tests have role to play in analyzing
genome-wide methylation data....
Kevin
On 10/7/2011 10:39 AM, Tim Triche, Jr. wrote:
> On Fri, Oct 7, 2011 at 8:15 AM, James W. MacDonald<jmacdon at med.umich.edu>wrote:
>
>> First, by increasing the number of genes, you can more accurately
>> estimate an overall variance, which is then used in the eBayes() step to
>> 'shrink' your observed variance towards this overall variance. This is
>> one of the reasons that limma is so popular - by using information from
>> all genes, you can increase the power to detect differences in
>> individual genes.
>>
> Careful though -- this is methylation data, which tends to be strongly
> bimodal. It's not clear that the assumption of a common variance across
> unmethylated, partially-methylated, and methylated sites is appropriate. I
> seem to recall Gordon Smyth commenting upon this at one point -- perhaps
> he'll chime in.
>
>
>> Second, when you increase the number of pairs you are using, your power
>> to detect differences increases as well. This has nothing to do with
>> limma per se; it is just that the power of a t-test increases as you
>> increase the number of observations.
>>
> This is of course appropriate at any time that you can get more high-quality
> samples rather than fewer :-)
>
More information about the Bioconductor
mailing list