Hello Esti,

Julie covered some points but I'll quickly chime in too.

> 1. How is the hypergeometric test implemented, in other words if we use the phyper R function, what woud be p, m and k in the example given below.

Simply printing the function (makeVennDiagram) to the screen will show you the implementation of the hypergeometric test.

> 2. Has somebody any additional idea how to calculate the totalTest when comparing between the two different transcription factor peaks?

I gave an extensive answer (see the archives) for both a sequence-dependent (transcription factor) and sequence-independent factor (histone) and how to  estimate a range for totalTest.  I think one way to estimate the upper limit for transcription factor binding is to count the number of DNA motifs for that factor in the genome.  What are your thoughts?  For two different factors I would count how often the two motifs co-occur and how often they are distinctly represented.  This would require you to assume some distances for "co-occurance" for example within 1 kb or 5 kb or 0.5 kb...  The possibilities are endless but describe your methods (and assumptions) clearly and the community may offer more insight...

> 3. Is there any other statistical test to calculate significance between overlaping peaks?

Like what?  Some people like to do scatter plots and then use the ensuing correlation coefficients as a readout for "overlap."  One word of caution though is to use the probes (from the array) that are called "bound" in both experiments when making the scatter plots.  If you draw a scatter plot between all probes in two chip-chip experiments I feel the "unbound" (null distribution) probes drives the correlation and is not reflective of what is actually going on.  Maybe others can comment on this...

Best,

Noah


On Dec 6, 2010, at 11:42 PM, Ester Feldmesser wrote:

> Hello Noah,
> 
> I read the archives, but still there are some points that are not clear to me. 
> 
> 1. How is the hypergeometric test implemented, in other words if we use the phyper R function, what woud be p, m and k in the example given below.
> 
> 2. Has somebody any additional idea how to calculate the totalTest when comparing between the two different transcription factor peaks?
> 
> 3. Is there any other statistical test to calculate significance between overlaping peaks?
> 
> Thanks,
> 
> Esti
> Ester Feldmesser, Ph.D.
> Bioinformatics Unit, Department of Biological Services
> Weizmann Institute of Science
> Levine Building, Room 110
> phone: +972-8-934-2614
> email: ester.feldmesser@weizmann.ac.il
> 
> He who thinketh he leadeth and hath no one following him is only taking a walk.
> Anonymous 
> 
> 
> On 12/6/2010 9:16 PM, Noah Dowell wrote:
>> 
>> Hello Ester,
>> 
>> Did you search the archives?  I commented on your question extensively and Julie has also offered helpful insight and those messages are in the archives.
>> 
>> Best,
>> 
>> Noah
>> 
>> 
>> On Dec 6, 2010, at 4:09 AM, Ester Feldmesser wrote:
>> 
>>   
>>> Hello,
>>> 
>>> I would like to understand how the hypergeometric test is applied in the makeVennDiagram function, specifically what is the total, the sample and the success groups.
>>> 
>>> Let's say we have two peak bed files with 3912 and 26009 peaks respectively and an overlap of 2577 peaks, how in this case should the test be applied?
>>> 
>>> Thank you,
>>> 
>>> Ester Feldmesser
>>> 
>>> 
>>> 	[[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>     
>>   


	[[alternative HTML version deleted]]