[Bioc-devel] A geneSet data class for facilitating GSEA

Furge, Kyle Kyle.Furge at vai.org
Wed Mar 14 15:56:29 CET 2007


I think this geneSet class would have significant interest. We have had some
off list conversations with Simon Lin concerning this and a couple days ago
put a simple package (PGSEA) in the devel repository that mirrors the object
model propose below.

But to follow up on the abstract class, would this mean the dispatch would
translate between identifiers on the fly?

Translating between identifiers is sometimes a burden when combining lists
of genes from different sources. It would be nice to handle the translation
in an elegant manner.

-kyle


> From: Vincent Carey 525-2265 <stvjc at channing.harvard.edu>
> Date: Wed, 14 Mar 2007 10:19:36 -0400 (EDT)
> To: Sean Davis <sdavis2 at mail.nih.gov>
> Cc: <bioc-devel at stat.math.ethz.ch>, Ross Lazarus <rerla at channing.harvard.edu>
> Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
> 
> i like this idea in principle.  the RGenetics folks may have done
> something in this direction.
> 
> you might want to have geneList as an abstract class, and then
> extend to EntrezGeneList, RefseqGeneList and so forth so that
> dispatch could work without looking into the idType ...
> 
> a version or date field might also be important
> 
> ---
> Vince Carey, PhD
> Assoc. Prof Med (Biostatistics)
> Harvard Medical School
> Channing Laboratory - ph 6175252265 fa 6177311541
> 181 Longwood Ave Boston MA 02115 USA
> stvjc at channing.harvard.edu
> 
> On Wed, 14 Mar 2007, Sean Davis wrote:
> 
>> GSEA, both the specific method and the general concept, is becoming more
>> prevalent and important in data analysis.  There have been several mentions
>> of including various "gene lists" for use with Category or other methods.  Is
>> there interest in making a generic geneSet class for storing such
>> information?  (Or does it already exist and I just haven't seen it?)  I bring
>> this up because I think it could be quite useful to have a general solution
>> for the community (like the eSet class has become).  A class could be as
>> simple as a vector of Entrez Gene IDs to something more complicated (but
>> perhaps a bit more useful for general consumption) like:
>> 
>> identifier: an identifier for the set (perhaps from a public database like
>> MSigDB)
>> title:  One line title
>> description: free text description
>> species: The species to which the dataset applies
>> URL: from where the data were derived
>> MIAME: class "MIAME" object
>> protocol: (could be in MIAME, also) description of methods to produce
>> genelist
>> from raw data source
>> idType:  What type of ID is stored (Entrez, Refseq, Ensembl, etc)?
>> geneList: vector of IDs
>> 
>> A simple wrapper data structure (even just a list) could then be used to
>> distribute the geneSets.  Some methods could then be defined for converting
>> to an incidence matrix for use by Category, etc.  But I think the most
>> important points from above are 1) maintaining some metadata about the
>> genelists and 2) standardization to reduce duplicated work.  Individual
>> groups would then instantiate the geneSets using whatever means they see fit
>> (parsing MSigDB, IPI files, etc.).
>> 
>> Any thoughts?
>> 
>> Sean
>> 
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> 
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel


This email message, including any attachments, is for the so...{{dropped}}



More information about the Bioc-devel mailing list