[BioC] GSEAbase example? [was KEGG & hyperGTest: an example?]
Robert Gentleman
rgentlem at fhcrc.org
Tue Oct 30 17:48:06 CET 2007
Hi Paul,
Paul Shannon wrote:
> Hi Robert,
>
> Last week you wrote:
>
> The GSEAbase package is I think a much better way to do that, and is
> actually intended for such purposes.
>
> where the 'that' you refer to is is my use (abuse...) of the Category
> package to find shared GO categories, KEGG pathways, and PFAM domains among
> (typically) 30-100 proteins identified by shotgun proteomics.
>
Yes, that is what I was referring to. But perhaps we can make sure
that we are both talking about the same thing.
If what you want is the incidence matrix, say, rows correspond to
yeast orfs, and columns to Pfam domains, with entries 1 or 0, depeding
on whether the gene/orf has the domain, then GSEABase is a better
alternative than performing hypergeomtetric testing.
If that is what you want, let me know and I will post some code, and
if not, maybe you could explain more about what you do want. I find it
helpful if you give the overall goal, not the immediate one.
thanks
Robert
> I'm sorry to be so slow on the uptake here, but after reading the
> GSEAbase vignette,
> I am no closer to understanding how to do the GSEAbase equivalent of the
> following
> quick method for finding that two of my proteins share the PF00069 domain:
>
> proteins = c ("YLR113W", "YBR069C", "YBR279W", "YCR030C", "YDR168W",
> "YNR031C")
>
> params = new ("PFAMHyperGParams", geneIds = proteins,
> universeGeneIds = character(0), annotation = "YEAST",
> pvalueCutoff = 1.0, testDirection = "over")
>
> hgr.yeast.pfam = hyperGTest (params)
>
> subset (summary (hgr.yeast.pfam), Count >= 2)
>
> PFAMID Pvalue OddsRatio ExpCount Count Size Term
> PF00069 PF00069 0.006570706 24.98214 0.1321280 2 114 PF00069
>
>
> Is there an example you could refer me to?
>
> Thanks!
>
> - Paul
>
>
>
>
>
>>> The Category package is a very handy way
>>> to discover, for example, which GO terms, KEGG pathways, and
>>> PFAM domains are shared among proteins, as I try to elucidate the
>>> results of experiments in phosphoproteomics.
>> The GSEAbase package is I think a much better way to do that, and is
>> actually intended for such purposes.
>
>
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list