[BioC] topGO "under-represented" GO terms?

Adrian Alexa adrian.alexa at gmail.com
Wed Jul 15 17:23:02 CEST 2009


Hi Jean,

first of all, by default the functions from topGO are testing for
over-represented GO-terms. One can define a test statistic which will
test for under-representation however. Your case is a bit strange, or
better said its a bit extreme, since based on your table you have very
few significant genes, and thus the results look a bit strange.

However, your confusion comes from miss-interpreting the columns of
the table. The "Annotated", "Significant" and "Expected" columns show
statistics computed for each GO term based on the complete
annotations, meaning that the "true path rule" is used to annotated
the genes to higher level terms. The "Expected" column shows an
estimate of the number of genes, anode of size "Annotated" will have
if the significant genes would be randomly selected from the gene
universe. Now, if you would use the "classic" algorithm for testing
for over-representation, then all GO terms with significant values
will have the "Significant"  < "Expected". However this is not the
case when using methods like "elim" or "weight" which remove or weight
genes annotated to GO terms when computing the significance. This
happens because when you "remove" the genes the counts for the
specific GO term change and the ratio between "Significant" and
"Expected" changes. Thus the confusion.

It might seem a bit counter intuitive the way the results are
displayed in the table, but I'm using the table more to compare the
results between different methods and the there columns mentioned
above help me in getting an overview of the GO annotations. Also it
won't be trivial to put the statistics updated after the genes are
removed or weighted. And I think they will be even more confusing. The
resulted p-value are used for that.

I hope things are more clear now.

Kind regards,
Adrian




On Wed, Jul 15, 2009 at 2:46 PM, jiayu wen<jiayu.jean.wen at gmail.com> wrote:
> Dear all,
>
> I am using topGO (elim) to find overrepresented GO terms (below is 3 example
> output terms). I am confused with the output "significant" and "expected".
> So if "significant"  < "expected" (in the case of " cellular process" and
> "cellular metabolic proces") , does it mean the GO terms are
> "under-represented"? whereas if "significant"  < "expected" (cell cycle)
> means those terms are overrepresented? If so, does topGO always reports
> "under-represented" GO terms as well? or I completely misunderstand it...
>
> Biological Process:
>    GO.ID                             Term
>            Annotated       Significant     Expected        elim
>    GO:0009987                 cellular process                         8686
>            42                           50.55           2.1e-06
>    GO:0007049                 cell cycle
>   278                     5                       1.62            0.00031
>    GO:0044237                 cellular metabolic process       4985
>    19                      29.01           0.01706
>
> Could some one help me to explain it.
> Thank you very much,
>
> Jean
>
>
> On Jul 15, 2009, at 1:22 PM, Stefanie Figura wrote:
>
>> Dear all,
>>
>>
>>
>> I have some problems with the analysis of the HumanHT-12 Chip from
>> Illumina
>> and hope somebody can help me.
>>
>>
>>
>> I have been analysing the data using the GenomeStudio Software until now.
>> Due to the fact, that some of the bead types are underrepresented on the
>> array, illumina implemented a so called “imputing function”. The
>> Techsupport
>> told me that it would not make a big difference if the imputing function
>> is
>> used or not.
>>
>> While comparing the results using both methods (no imputing vs imputing),
>> I
>> found that the “imputing  function” leads to more than twice as many
>> differentially expressed genes (167 vs 396).
>>
>>
>>
>> I was wondering, if there is any analog function or package implemented in
>> R
>> & Bioconductor ?
>>
>>
>>
>> Any kind of advice is welcome.
>>
>> Thank you very much in advance!
>>
>>
>>
>> Kind regards,
>>
>> Stefanie
>>
>>
>>
>>
>> ----------------------------------------------------------------------------
>> ---
>>
>> Dipl.Chem. Stefanie Figura
>>
>> Leibniz-Institut für Arterioskleroseforschung
>>
>> Department Genetische Epidemiologie vaskulärer Erkrankungen
>>
>> Domagkstrasse 3
>>
>> 48149 Münster
>>
>>
>> ----------------------------------------------------------------------------
>> ---
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list