[BioC] GOstats hyperGTest question

ivan.borozan at utoronto.ca ivan.borozan at utoronto.ca
Fri Jan 26 15:54:17 CET 2007


Hi Seth,
Thanks for your replay I actually had duplicates in my gene universe.  
Running hyperGTest now (without duplicates) gives meaningful results.

all the best,
Ivan

Quoting Seth Falcon <sfalcon at fhcrc.org>:

> Hi Ivan,
>
> ivan.borozan at utoronto.ca writes:
>> I got following results using hyperGTest(params) with a given list of genes
>>
>>> summary(hgOver)
>>         GOBPID       Pvalue   OddsRatio    ExpCount Count Size
>> 1  GO:0030185 0.000000e+00  -73.314685  0.02692165     2    1
>> 2  GO:0006067 0.000000e+00 -110.746479  0.05384330     3    2
>> 3  GO:0006069 0.000000e+00 -110.746479  0.05384330     3    2
>
> Hmm, that is a suspect result.  One would expect Size >= Count.  In
> the current devel version of Category and GOstats, I have added code
> to verify that the selected gene list (geneIds) and the gene universe
> do not contain any duplicates.  Could you verify that your input does
> not contain duplicate IDs either in the selected list or the universe?
>
>> If for example I look at genes that are associated with the first GO
>> term (i.e GO:0030185) I get:
>>
>>
>>> probeSetSummary(hgOver)[[1]]
>>    EntrezID ProbeSetID selected
>> 1     3043     144221        0
>> 2     3043     148425        0
>> 3     3043    3108408        0
>> 4     3043    5708746        0
>
> This is, of course, also surprising, but it is difficult to assess
> what is going on without knowing more details of what data you used as
> input.  Are you sure that all Entrez IDs in geneIds(params) are
> represented by at least one probe set on the chip?
>
>> My question is how are Counts (in this case Count = 2) in the above
>> summary(hgOver) table obtained ?
>
> The details are in the code, but the intention is that Count is the
> intersection of the selected gene list with the Entrez IDs annotated
> at the given GO term.
>
>> Looking at probeSetSummary(hgOver)[[1]] I can see one EntrezID
>> (EntrezID = 3043) and 4 ProbeSetID associated with this particular
>> node (i.e GO:0030185).
>
> That just tells you that there are 4 probesets that interrogate Entrez
> ID 3043. The count in the hyperGTest result tells you that 2 Entrez
> IDs from the selected gene list are in the list of genes annotated at
> GO:0030185.
>
> I have added a considerable amount of detail to the GOstats vignette
> in the current devel repository and I would suggest reading over it:
>
>     http://www.bioconductor.org/packages/1.9/bioc/html/GOstats.html
>
> + seth
>



More information about the Bioconductor mailing list