# [BioC] ChIPpeakAnno venn diagram statistics

Ester Feldmesser ester.feldmesser at weizmann.ac.il
Wed Dec 8 09:14:56 CET 2010

```Thank you very much for your help.

I have several thoughts regarding the overlap between peaks in chIP-seq
analyses:

1. Could I calculate the p-value also in the following way for my example?

phyper(2577-1, 3912, totalTest-3912, 26009, lower.tail =FALSE)

Since the results are not symmetric and the experiments have equal
weight according to my understanding, I would not be sure what is the
right way to apply the test.

> phyper(2577-1, 3912, 30000-3912, 26009, lower.tail =FALSE)
[1] 1
> phyper(2577-1, 30000-26009,26009, 3912, lower.tail =FALSE)
[1] 0

2. Regarding the totalTest, I agree that probably taking only the peaks
we see in the two experiments is an underestimation. On the other hand,
counting the number of DNA motifs for that factor in the genome may give
a too high number because some of the motifs are probably not functional
and appear in the genome by chance. I admit that it is easier
criticizing than founding a solution and I have not found a solution I
am happy with.

Any ideas or comments will be highly appreciated.

Esti

On 12/7/2010 11:42 PM, Zhu, Lihua (Julie) wrote:
> I want to take this opportunity to thank Noah to share his insights and
> experience using the ChIPpeakAnno package.
> Ester, here is how the p-value is calculated for overlapping using your
> given example, phyper(2577-1, 3912, totalTest-3912, 26009, lower.tail =
> FALSE).
>
>> Hello Noah,
>>
>> I read the archives, but still there are some points that are not clear
>> to me.
>>
>> 1. How is the hypergeometric test implemented, in other words if we use
>> the phyper R function,
>> <http://127.0.0.1:26076/library/stats/html/Hypergeometric.html>what woud
>> be p, m and k in the example given below.
>>
>> 2. Has somebody any additional idea how to calculate the totalTest when
>> comparing between the two different transcription factor peaks?
>>
>> 3. Is there any other statistical test to calculate significance between
>> overlaping peaks?
>>
>> Thanks,
>>
>> Esti
>>
>>
>> On 12/6/2010 9:16 PM, Noah Dowell wrote:
>>
>>> Hello Ester,
>>>
>>> Did you search the archives?  I commented on your question extensively and
>>> Julie has also offered helpful insight and those messages are in the
>>> archives.
>>>
>>> Best,
>>>
>>> Noah
>>>
>>>
>>> On Dec 6, 2010, at 4:09 AM, Ester Feldmesser wrote:
>>>
>>>
>>>
>>>> Hello,
>>>>
>>>> I would like to understand how the hypergeometric test is applied in the
>>>> makeVennDiagram function, specifically what is the total, the sample and the
>>>> success groups.
>>>>
>>>> Let's say we have two peak bed files with 3912 and 26009 peaks respectively
>>>> and an overlap of 2577 peaks, how in this case should the test be applied?
>>>>
>>>> Thank you,
>>>>
>>>> Ester Feldmesser
>>>>
>>>>
>>>>
>>>
>>>
>>
```