[Bioc-devel] A geneSet data class for facilitating GSEA

Seth Falcon sfalcon at fhcrc.org
Wed Mar 14 19:56:09 CET 2007


Sean Davis <sdavis2 at mail.nih.gov> writes:
> I agree that these are all close.  I was thinking of keeping the
> collections as a separate higher-level data structure.  However, an
> email off-list I got suggested that a geneSet could be composed of a
> set of ID's OR another set of geneSets.  A collection would then be
> a set of geneSets that are related in some way.  The interpretation
> is straightforward--a geneSet becomes the union of all unique IDs in
> the contained geneSets.  So a maintainer could choose to code chr16q
> as a combination of all the geneSets for the bands of 16q, or simply
> make one large vector of IDs. 

I like this notion (composite pattern).

> What is more problematic is an API for getting at individual
> geneSets (I want 16q24, but how do I get there if I need to go
> through chr16 and 16q24) embedded in a higher-level set in such a
> setup.
>
> I'm inclined to think that hierarchical geneSets might be too complicated to 
> want to deal with, but Seth and the Bioc folks would know best.

Yes, this introduces some complexities.  We already have code that
represents GO and cytogenetic bands using graph objects so that we can
model the hierarchical structure -- and take advantage of it in our
computation.

I'm not sure the GeneSet class needs to worry about this.  We can use
other classes, such as graph, to model the relations between a set of
GeneSet objects.  This is why having a unique ID for each GeneSet will
be nice.

> I agree.  The one point that Vince's email makes, though, is that it would be 
> necessary to standardize the nomenclature for the various gene ID types if 
> there is any hope of introducing "smarts" in dealing with translation.  One 
> way is to subclass, but the other is to validate any idType slot with 
> agreed-upon types.  

Yep.

>> Perhaps we should start a wiki page to hammer out a class definition?
>
> Sounds great.

http://wiki.fhcrc.org/bioc/GeneSet_Class_Discussion

So let's move further discussion of this topic to the wiki.

Best Wishes,

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the Bioc-devel mailing list