[Bioc-devel] A geneSet data class for facilitating GSEA

Simon Lin simonlin at duke.edu
Wed Mar 14 23:52:29 CET 2007


As suggested by Kyle, a translator among the identifiers could be very 
helpful! Idealy, on the fly, but no need to be perfect. Ok to have mapping 
errors in this process.

I like the name of geneSet or geneList for the class.

Methods associated with this class could be
1) a constructor, as mentioned by Kyle.
2) a translator: obtain other types of identifiers. USE CASE: give a set of 
symbol, find entrezID in the mouse or rat genome.
3) a method to "flat" the nested structure associated with the class.

The wiki by Seth on this topic is at
  http://wiki.fhcrc.org/bioc/GeneSet_Class_Discussion

By the way, I think the set_id and collection_id linking method needs some 
improvement. It allows nested structure, but it is hard to construct 
collections this way -- adding any new collections has to change all the 
pointers in each geneSet.

Simon



--------------I
 think this geneSet class would have significant interest. We have had some
off list conversations with Simon Lin concerning this and a couple days ago
put a simple package (PGSEA) in the devel repository that mirrors the object
model propose below.

But to follow up on the abstract class, would this mean the dispatch would
translate between identifiers on the fly?

Translating between identifiers is sometimes a burden when combining lists
of genes from different sources. It would be nice to handle the translation
in an elegant manner.

-kyle


> From: Vincent Carey 525-2265 <stvjc at channing.harvard.edu>
> Date: Wed, 14 Mar 2007 10:19:36 -0400 (EDT)
> To: Sean Davis <sdavis2 at mail.nih.gov>
> Cc: <bioc-devel at stat.math.ethz.ch>, Ross Lazarus <rerla at 
> channing.harvard.edu>
> Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
>
> i like this idea in principle.  the RGenetics folks may have done
> something in this direction.
>
> you might want to have geneList as an abstract class, and then
> extend to EntrezGeneList, RefseqGeneList and so forth so that
> dispatch could work without looking into the idType ...
>
> a version or date field might also be important
>
> ---
> Vince Carey, PhD
> Assoc. Prof Med (Biostatistics)
> Harvard Medical School
> Channing Laboratory - ph 6175252265 fa 6177311541
> 181 Longwood Ave Boston MA 02115 USA
> stvjc at channing.harvard.edu



More information about the Bioc-devel mailing list