[BioC] End of the line of GOstats: making sense of the hypergeometric test results now
James W. MacDonald
jmacdon at med.umich.edu
Wed Nov 25 14:53:35 CET 2009
Hi Massimo,
Massimo Pinto wrote:
> Greetings all,
>
> Having first searched the GMane archives, I suppose the following
> question is appropriate. After selecting my 'entrezUniverse', I have
> run an hypergeometric test, as implemented in functions provided in
> GOstats, and thus obtained a readable, hyperlinked report containing a
> list of the ontology nodes that appear to have been significantly
> implicated, along with p values, odds ratio, number of significantly
> regulated genes that fall in each listed node, etc.
>
> The report is not exactly short, and I am looking for criteria to
> proceed with the interpretation of the results. Specifically, I am
> trying to hunt for the most 'interesting' implicated ontology nodes
> and, to this end, a marker may be useful. Assuming this line of
> thinking is appropriate and focusing on the first few lines of the
> report:
>
>> GO.df.CM3.ctr1.2.3
>
> GOBPID Pvalue OddsRatio ExpCount Count Size
> Term
> 1 GO:0040011 9.322848e-05 2.558205 11.8928490 26 145
> locomotion
> 2 GO:0002376 2.337660e-04 1.887324 28.2147590 47 344
> immune system process
> 3 GO:0007165 2.821193e-04 1.541496 82.4297464 110 1005
> signal transduction
> 4 GO:0006954 2.840421e-04 2.892962 7.3817683 18 90
> inflammatory response
> 5 GO:0051272 4.985200e-04 6.638731 1.5583733 7 19
> positive regulation of cell motion
> 6 GO:0007154 5.866973e-04 1.493138 88.4992004 115 1079
> cell communication
> [...]
>
> I do wonder whether the correct marker for my hunt is the p value, or
> the Odds Ratio, which would rank my list differently. Plus, the
> ontology nodes containing the largest number of genes (Size, above)
> may be of too broad scope to reveal the presence of a biological
> process that is specifically implicated in my experiment. By the same
> token, ontology nodes with too few genes may not provide convincing
> evidence of their implication.
>
> Put shortly, what's the suggested strategy to proceed?
The strategy depends on your original hypothesis. If the hypothesis was
that inflammation should be a factor in your experimental samples, then
you should be looking at #4.
If there wasn't a hypothesis, then I would tend to look at the more
directed terms first. Something like locomotion is so general as to be
useless. However, positive regulation of cell motion would probably be a
more tractable ontology to explore.
Best,
Jim
>
> Thank you very much in advance to all of you who will read this post.
>
> Yours
> Massimo
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the Bioconductor
mailing list