[BioC] GOstats hyperGTest question
ivan.borozan at utoronto.ca
ivan.borozan at utoronto.ca
Fri Jan 26 15:54:17 CET 2007
Hi Seth,
Thanks for your replay I actually had duplicates in my gene universe.
Running hyperGTest now (without duplicates) gives meaningful results.
all the best,
Ivan
Quoting Seth Falcon <sfalcon at fhcrc.org>:
> Hi Ivan,
>
> ivan.borozan at utoronto.ca writes:
>> I got following results using hyperGTest(params) with a given list of genes
>>
>>> summary(hgOver)
>> GOBPID Pvalue OddsRatio ExpCount Count Size
>> 1 GO:0030185 0.000000e+00 -73.314685 0.02692165 2 1
>> 2 GO:0006067 0.000000e+00 -110.746479 0.05384330 3 2
>> 3 GO:0006069 0.000000e+00 -110.746479 0.05384330 3 2
>
> Hmm, that is a suspect result. One would expect Size >= Count. In
> the current devel version of Category and GOstats, I have added code
> to verify that the selected gene list (geneIds) and the gene universe
> do not contain any duplicates. Could you verify that your input does
> not contain duplicate IDs either in the selected list or the universe?
>
>> If for example I look at genes that are associated with the first GO
>> term (i.e GO:0030185) I get:
>>
>>
>>> probeSetSummary(hgOver)[[1]]
>> EntrezID ProbeSetID selected
>> 1 3043 144221 0
>> 2 3043 148425 0
>> 3 3043 3108408 0
>> 4 3043 5708746 0
>
> This is, of course, also surprising, but it is difficult to assess
> what is going on without knowing more details of what data you used as
> input. Are you sure that all Entrez IDs in geneIds(params) are
> represented by at least one probe set on the chip?
>
>> My question is how are Counts (in this case Count = 2) in the above
>> summary(hgOver) table obtained ?
>
> The details are in the code, but the intention is that Count is the
> intersection of the selected gene list with the Entrez IDs annotated
> at the given GO term.
>
>> Looking at probeSetSummary(hgOver)[[1]] I can see one EntrezID
>> (EntrezID = 3043) and 4 ProbeSetID associated with this particular
>> node (i.e GO:0030185).
>
> That just tells you that there are 4 probesets that interrogate Entrez
> ID 3043. The count in the hyperGTest result tells you that 2 Entrez
> IDs from the selected gene list are in the list of genes annotated at
> GO:0030185.
>
> I have added a considerable amount of detail to the GOstats vignette
> in the current devel repository and I would suggest reading over it:
>
> http://www.bioconductor.org/packages/1.9/bioc/html/GOstats.html
>
> + seth
>
More information about the Bioconductor
mailing list