[Bioc-devel] A geneSet data class for facilitating GSEA

Wed Mar 14 15:46:53 CET 2007

BioC Developers,

I recently submitted a new package to Bioconductor which facilitates
this. The package is called PGSEA and it will be available for download
as soon as I can make it pass the automated check/build procedure.
Included in the package are a number of gene sets that I have collected.
Here is an example of one created from the Golub Connectivity-Map data:

-----Original Message-----
From: Vincent Carey 525-2265 <stvjc at channing.harvard.edu>
Date: Wed, 14 Mar 2007 10:19:36 -0400 (EDT)
To: Sean Davis <sdavis2 at mail.nih.gov>
Cc: <bioc-devel at stat.math.ethz.ch>, Ross Lazarus
<rerla at channing.harvard.edu>
Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA

i like this idea in principle.  the RGenetics folks may have done
something in this direction.

you might want to have geneList as an abstract class, and then extend to
EntrezGeneList, RefseqGeneList and so forth so that dispatch could work
without looking into the idType ...

a version or date field might also be important

---
Vince Carey, PhD
Assoc. Prof Med (Biostatistics)
Harvard Medical School
Channing Laboratory - ph 6175252265 fa 6177311541
181 Longwood Ave Boston MA 02115 USA
stvjc at channing.harvard.edu

On Wed, 14 Mar 2007, Sean Davis wrote:

> GSEA, both the specific method and the general concept, is becoming 
> more prevalent and important in data analysis.  There have been 
> several mentions of including various "gene lists" for use with 
> Category or other methods.  Is there interest in making a generic 
> geneSet class for storing such information?  (Or does it already exist

> and I just haven't seen it?)  I bring this up because I think it could

> be quite useful to have a general solution for the community (like the

> eSet class has become).  A class could be as simple as a vector of 
> Entrez Gene IDs to something more complicated (but perhaps a bit more
useful for general consumption) like:
>
> identifier: an identifier for the set (perhaps from a public database 
> like
> MSigDB)
> title:  One line title
> description: free text description
> species: The species to which the dataset applies
> URL: from where the data were derived
> MIAME: class "MIAME" object
> protocol: (could be in MIAME, also) description of methods to produce 
> genelist from raw data source
> idType:  What type of ID is stored (Entrez, Refseq, Ensembl, etc)?
> geneList: vector of IDs
>
> A simple wrapper data structure (even just a list) could then be used 
> to distribute the geneSets.  Some methods could then be defined for 
> converting to an incidence matrix for use by Category, etc.  But I 
> think the most important points from above are 1) maintaining some 
> metadata about the genelists and 2) standardization to reduce 
> duplicated work.  Individual groups would then instantiate the 
> geneSets using whatever means they see fit (parsing MSigDB, IPI files,
etc.).
>
> Any thoughts?
>
> Sean
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

_______________________________________________
Bioc-devel at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message, including any attachments, is for the so...{{dropped}}