[Bioc-devel] A geneSet data class for facilitating GSEA

Herve Pages hpages at fhcrc.org
Thu Mar 15 21:55:02 CET 2007


Hi Karl and BioC Developers,

PGSEA is now available via biocLite (for R-devel users only):

  http://bioconductor.org/packages/2.0/bioc/html/PGSEA.html

Cheers,
H.

Dykema, Karl wrote:
> BioC Developers,
> 
> I recently submitted a new package to Bioconductor which facilitates
> this. The package is called PGSEA and it will be available for download
> as soon as I can make it pass the automated check/build procedure.
> Included in the package are a number of gene sets that I have collected.
> Here is an example of one created from the Golub Connectivity-Map data:
> 
> 
> 
> 
> -----Original Message-----
> From: Vincent Carey 525-2265 <stvjc at channing.harvard.edu>
> Date: Wed, 14 Mar 2007 10:19:36 -0400 (EDT)
> To: Sean Davis <sdavis2 at mail.nih.gov>
> Cc: <bioc-devel at stat.math.ethz.ch>, Ross Lazarus
> <rerla at channing.harvard.edu>
> Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
> 
> i like this idea in principle.  the RGenetics folks may have done
> something in this direction.
> 
> you might want to have geneList as an abstract class, and then extend to
> EntrezGeneList, RefseqGeneList and so forth so that dispatch could work
> without looking into the idType ...
> 
> a version or date field might also be important
> 
> ---
> Vince Carey, PhD
> Assoc. Prof Med (Biostatistics)
> Harvard Medical School
> Channing Laboratory - ph 6175252265 fa 6177311541
> 181 Longwood Ave Boston MA 02115 USA
> stvjc at channing.harvard.edu
> 
> On Wed, 14 Mar 2007, Sean Davis wrote:
> 
>> GSEA, both the specific method and the general concept, is becoming 
>> more prevalent and important in data analysis.  There have been 
>> several mentions of including various "gene lists" for use with 
>> Category or other methods.  Is there interest in making a generic 
>> geneSet class for storing such information?  (Or does it already exist
> 
>> and I just haven't seen it?)  I bring this up because I think it could
> 
>> be quite useful to have a general solution for the community (like the
> 
>> eSet class has become).  A class could be as simple as a vector of 
>> Entrez Gene IDs to something more complicated (but perhaps a bit more
> useful for general consumption) like:
>> identifier: an identifier for the set (perhaps from a public database 
>> like
>> MSigDB)
>> title:  One line title
>> description: free text description
>> species: The species to which the dataset applies
>> URL: from where the data were derived
>> MIAME: class "MIAME" object
>> protocol: (could be in MIAME, also) description of methods to produce 
>> genelist from raw data source
>> idType:  What type of ID is stored (Entrez, Refseq, Ensembl, etc)?
>> geneList: vector of IDs
>>
>> A simple wrapper data structure (even just a list) could then be used 
>> to distribute the geneSets.  Some methods could then be defined for 
>> converting to an incidence matrix for use by Category, etc.  But I 
>> think the most important points from above are 1) maintaining some 
>> metadata about the genelists and 2) standardization to reduce 
>> duplicated work.  Individual groups would then instantiate the 
>> geneSets using whatever means they see fit (parsing MSigDB, IPI files,
> etc.).
>> Any thoughts?
>>
>> Sean
>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list 
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> This email message, including any attachments, is for the so...{{dropped}}
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list