I understand all the statistical reasons for converting from methylation 
"beta values" to something logistic, and am frequently tempted to do 
this myself.

But I think in the context of methylation that this advice should come 
with a warning: changes in levels near 0 and 1 may have a lot of 
leverage on the final results.  For example, we have done analyses on 
some of the TCGA data where we find "statistically significant 
differences in methylation between normal and tumor" where the mean beta 
values are 0.03 and 0.08.  I find it hard to believe that this level of 
change in methylation has any kind of biological meaning.  In fact, I'm 
not even convinced that we can accurately measure this amount of change 
using the technology that TCGA is using (although I might well believe 
that such a change could result from batch effects, whether in the assay 
or in the data processing).

I don't have any magic solution to fix this issue; it is intrinsic in 
the shape of the logistic curve. One might want to explore shrinking the 
beta values toward 0.5 (i.e., away from 0 and 1), but I can't offer any 
concrete advice on how well this might work in practice.

Best,
     Kevin

On 8/17/2012 12:36 PM, Tim Triche, Jr. wrote:
> The reason to switch from a proportion (%, beta-value, whichever; anything
> measuring M / (M+U) where M and U are surrogates for methylated and
> unmethylated cytosines) to a fold-change (logit(proportion.methylated) or
> log2(M/U)) is that the latter is far more amenable to linear models, and
> roughly parallels the expected behavior in terms of expression changes on a
> log2 or log-fold-change scale.
>
> Furthermore, the range for logit(M/U) is -Infinity to +Infinity, which is
> appropriate when you are modeling something as having Gaussian error.
>   Something with a range of 0 to 1 is neither homoskedastic (which is to
> say, such a 0-1 measurement will have a variance that depends on the mean)
> nor unbounded (this turns out to be an issue when computing maximum
> likelihood estimates, for example, as values close to the boundary will
> cause problems).
>
> In any event, logit(% methylation) is equivalent to log(M/U) which is where
> I veered off course this morning.  My brain seems to have been a bit slow.
>
>
> On Fri, Aug 17, 2012 at 9:26 AM, zeynep özkeserli<
> zeynep.ozkeserli@gmail.com>  wrote:
>
>> Dear Tim,
>>
>> Thank you for your answer. But to my understanding, if I could get this
>> answer by undoing the logit function (I tought you were doing this), we
>> should use inverse logit function. Which is exp(x)/(1+exp(x))
>>
>> And in my case it gives:
>>
>>> exp(-0.30427)/(1+exp(-0.30427))
>> [1] 0.424514
>>
>> Ok, this seems reasonable. And it makes sense how you put this into words.
>> But if we could use this one as a methylation measure, why would the
>> creators make things more complicated and convert the value to a logit
>> value? So, again, to my understanding, I shall learn how to interpret the
>> diff thing.
>>
>> Thank you again,
>>
>> Best :)
>>
>> Zeynep
>>
>> On Fri, Aug 17, 2012 at 6:29 PM, Tim Triche, Jr.<tim.triche@gmail.com>wrote:
>>
>>> Perhaps "on average this region has an
>>>
>>> R>  1 - exp(-0.347)
>>> [1] 0.2931947
>>>
>>> approximately 29.3% relative decrease in cytosine methylation after
>>> treatment?"
>>>
>>>
>>>
>>> On Fri, Aug 17, 2012 at 1:56 AM, zeynep özkeserli<
>>> zeynep.ozkeserli@gmail.com>  wrote:
>>>
>>>> Dear All, Dear Dr. Aryee and Dr. Carvalho,
>>>>
>>>> I have a question on interpreting the results of dmrFinder function.
>>>>
>>>> We have performed a CHARM analysis on the data we got from NimbleGen
>>>> Promoter Medip Arrays. The data is obtained from each patient before and
>>>> after treatment. And after performing CHARM analysis, we got some
>>>> differentially methylated regions (DMRs).
>>>>
>>>> As the samples are before and after treatment results of the same
>>>> patient,
>>>> the samples are treated as paired samples.
>>>>
>>>> My question is about interpretation of the results:
>>>>
>>>> After running this:
>>>>
>>>> dmr1_2<- dmrFinder(rawData, p = p, groups = grp,compare = c("to", "ts"),
>>>> cutoff=0.995,paired=TRUE,pairs=pairs)
>>>>
>>>> to: before treatment
>>>> ts: after treatment
>>>>
>>>> - For example I have found a DMR like this (I summerized the result for
>>>> my
>>>> question):
>>>>
>>>> chr 8, diff= -0.30427 and maxdiff=0.47935
>>>>
>>>> As the diff value is calculated like this:   average l (logit(percentage)
>>>> methylation if l=NULL) difference within the DMR if paired=TRUE
>>>>
>>>> Is it true to say that: "The region has 0.30427 times the risk of being
>>>> methylated in samples of after treatment compared to samples of before
>>>> treatment."
>>>>
>>>> I know that it does not look meaningful to use the word "risk" when
>>>> talking
>>>> about something like that but I can not find a better way to say it
>>>> truely. Is it possible to express it like a "0.30427 fold difference in
>>>> methylation"? And also am I interpreting the "-" sign truely?
>>>>
>>>> Thank you for your help in advance,
>>>>
>>>> Best Regards,
>>>>
>>>> Zeynep
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor@r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>> --
>>> *A model is a lie that helps you see the truth.*
>>> *
>>> *
>>> Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
>>>
>>>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

	[[alternative HTML version deleted]]