[BioC] GSEAbase example? [was KEGG & hyperGTest: an example?]

Robert Gentleman rgentlem at fhcrc.org
Tue Oct 30 17:48:06 CET 2007


Hi Paul,


Paul Shannon wrote:
> Hi Robert,
> 
> Last week you wrote:
> 
>    The GSEAbase package is I think a much better way to do that, and is
>    actually intended for such purposes.
> 
> where the 'that' you refer to is is my use (abuse...) of the Category
> package to find shared GO categories, KEGG pathways, and PFAM domains among
> (typically) 30-100 proteins identified by shotgun proteomics.
> 

   Yes, that is what I was referring to. But perhaps we can make sure 
that we are both talking about the same thing.

   If what you want is the incidence matrix, say, rows correspond to 
yeast orfs, and columns to Pfam domains, with entries 1 or 0, depeding 
on whether the gene/orf has the domain, then GSEABase is a better 
alternative than performing hypergeomtetric testing.

   If that is what you want, let me know and I will post some code, and 
if not, maybe you could explain more about what you do want.  I find it 
helpful if you give the overall goal, not the immediate one.

  thanks
   Robert


> I'm sorry to be so slow on the uptake here, but after reading the 
> GSEAbase vignette,
> I am no closer to understanding how to do the GSEAbase equivalent of the 
> following
> quick method for finding that two of my proteins share the PF00069 domain:
> 
>   proteins = c ("YLR113W", "YBR069C", "YBR279W", "YCR030C", "YDR168W", 
> "YNR031C")
> 
>   params = new ("PFAMHyperGParams", geneIds = proteins,
>                 universeGeneIds = character(0), annotation = "YEAST",
>                 pvalueCutoff = 1.0, testDirection = "over")
> 
>   hgr.yeast.pfam = hyperGTest (params)
> 
>   subset (summary (hgr.yeast.pfam), Count >= 2)
> 
>            PFAMID      Pvalue OddsRatio  ExpCount Count Size    Term
>   PF00069 PF00069 0.006570706  24.98214 0.1321280     2  114 PF00069
> 
> 
> Is there an example you could refer me to?
> 
> Thanks!
> 
>  - Paul
> 
> 
> 
> 
> 
>>> The Category package is a very handy way
>>> to discover, for example, which GO terms, KEGG pathways, and
>>> PFAM domains are shared among proteins, as I try to elucidate the
>>> results of experiments in phosphoproteomics.
>>  The GSEAbase package is I think a much better way to do that, and is 
>> actually intended for such purposes.
> 
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list