[Bioc-devel] A geneSet data class for facilitating GSEA (Robert Gentleman)

Thu Mar 29 19:59:30 CEST 2007

oops - I posted the wrong url
http://wiki.fhcrc.org/bioc/Lausanne_Dev_Meeting_2007


Tarca, Adi wrote:
> Hi,
> I wonder if the direction of change will be of any use here. Firstly because a gene set should be independent of a particular experiment. Secondly, one can define the two groups in the order he wants so "UP" and "DOWN" will be confusing.
> Adi Tarca 
> ________________________________
> 
> From: bioc-devel-bounces at stat.math.ethz.ch on behalf of bioc-devel-request at stat.math.ethz.ch
> Sent: Sat 3/17/2007 7:00 AM
> To: bioc-devel at stat.math.ethz.ch
> Subject: Bioc-devel Digest, Vol 36, Issue 12
> 
> 
> 
> Send Bioc-devel mailing list submissions to
>         bioc-devel at stat.math.ethz.ch
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://stat.ethz.ch/mailman/listinfo/bioc-devel
> or, via email, send a message with subject or body 'help' to
>         bioc-devel-request at stat.math.ethz.ch
> 
> You can reach the person managing the list at
>         bioc-devel-owner at stat.math.ethz.ch
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioc-devel digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: A geneSet data class for facilitating GSEA (Robert Gentleman)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Fri, 16 Mar 2007 06:18:43 -0700
> From: Robert Gentleman <rgentlem at fhcrc.org>
> Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
> To: Vincent Carey 525-2265 <stvjc at channing.harvard.edu>
> Cc: bioc-devel at stat.math.ethz.ch
> Message-ID: <45FA9933.5030108 at fhcrc.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Hi,
> 
> Vincent Carey 525-2265 wrote:
>>> Dear bioc-developers,
>>>
>>> would it be useful to introduce an additional slot for the direction and/or
>>> magnitude of expression change of each gene in the gene set?
>> My understanding is that we are currently trying to get a
>> structure that identifies a group of genes in a coherent way.
>> Connecting a group of genes to a specific experimental result is outside
>> the scope of this task.
> 
>    That is a good question, but I would like to point out that it is
> almost surely the case that notions of direction of change, and
> magnitude are with respect to a comparison of phenotype (eg disease to
> healthy, or stage I vs stage IV) and hence are not properties of the
> gene set.
> 
>    While that information is important and useful in a particular
> analysis, it should not be stored with the gene set, in my opinion. We
> will need some easy way for users to specify it and use it in practice,
> but as Vince has said, it is probably not what we want here.
> 
>> Designing an extension of the group class that incorporates
>> qualitative or quantitative information on gene behaviors under
>> certain conditions seems worthwhile but should be kept separate
>> from the original design problem -- I think.
>>
>>> It seems that GSEA and GSEA-like methods use sets of genes that are
>>> homogeneously down- or upregulated (correct me if I am wrong, I am far from
>>> being up to date on GSEA methods).
>>>
>>> This seems to be reflected in the example presented in the PGSEA vignette
>>> where target genes of Ras and Myc are separated into 'UP' and 'DN' regulated
>>> genes.
> 
>     Hopefully we will use UP and DOWN, the savings by using
> abbreviations are almost never worth it, especially when for many users
> English is not their first language.
> 
>    best wishes
>      Robert
> 
>>> However, (alternative?) methods could actually use the quantitative
>>> information about expression changes to score each gene set. Adding a
>>> corresponding slot in the geneSet class would allow to accommodate such
>>> methods.
>>>
>>> Best,
>>> Alexandre
>>>
>>>
>>> -----Original Message-----
>>> From: bioc-devel-bounces at stat.math.ethz.ch
>>> [mailto:bioc-devel-bounces at stat.math.ethz.ch] On Behalf Of Dykema, Karl
>>> Sent: mercredi, 14. mars 2007 16:15
>>> To: bioc-devel at stat.math.ethz.ch
>>> Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
>>>
>>> Sorry I forgot to attach the str()
>>>
>>> $ 15-delta prostaglandin J2 10 uM  DOWN : list()
>>>   ..- attr(*, "reference")= chr "15-delta prostaglandin J2 10 uM  DOWN "
>>>   ..- attr(*, "desc")= chr "DOWN "
>>>   ..- attr(*, "source")= chr "PubMed"
>>>   ..- attr(*, "design")= chr "????"
>>>   ..- attr(*, "identifier")= chr "17008526"
>>>   ..- attr(*, "species")= chr "human"
>>>   ..- attr(*, "data")= chr "raw"
>>>   ..- attr(*, "private")= chr "no"
>>>   ..- attr(*, "creator")= chr "Karl Dykema <karl.dykema at vai.org>"
>>>   ..- attr(*, "ids")= chr [1:75] "171392" "5680" "2149" "54557" ...
>>>   ..- attr(*, "class")= atomic [1:1] smc
>>>   .. ..- attr(*, "package")= chr "PGSEA"
>>>
>>>
>>> This closely mirrors the geneSet proposed and we will be happy to adopt
>>> a consensus structure.
>>>
>>> The only significant difference is a "creator" to let folk know who
>>> curated the gene list... This may help if groups are collaborating to
>>> the collect gene sets.
>>>
>>>
>>> -------------------------------
>>> Karl Dykema
>>> Bioinformatics Programmer/Analyst
>>> Laboratory of Computational Biology
>>> Van Andel Research Institute
>>> 333 Bostwick Ave. NE
>>> Grand Rapids, MI 49503
>>> (616) 234-5554
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Vincent Carey 525-2265 <stvjc at channing.harvard.edu>
>>> Date: Wed, 14 Mar 2007 10:19:36 -0400 (EDT)
>>> To: Sean Davis <sdavis2 at mail.nih.gov>
>>> Cc: <bioc-devel at stat.math.ethz.ch>, Ross Lazarus
>>> <rerla at channing.harvard.edu>
>>> Subject: Re: [Bioc-devel] A geneSet data class for facilitating GSEA
>>>
>>> i like this idea in principle.  the RGenetics folks may have done
>>> something in this direction.
>>>
>>> you might want to have geneList as an abstract class, and then extend to
>>> EntrezGeneList, RefseqGeneList and so forth so that dispatch could work
>>> without looking into the idType ...
>>>
>>> a version or date field might also be important
>>>
>>> ---
>>> Vince Carey, PhD
>>> Assoc. Prof Med (Biostatistics)
>>> Harvard Medical School
>>> Channing Laboratory - ph 6175252265 fa 6177311541
>>> 181 Longwood Ave Boston MA 02115 USA
>>> stvjc at channing.harvard.edu
>>>
>>> On Wed, 14 Mar 2007, Sean Davis wrote:
>>>
>>>> GSEA, both the specific method and the general concept, is becoming
>>>> more prevalent and important in data analysis.  There have been
>>>> several mentions of including various "gene lists" for use with
>>>> Category or other methods.  Is there interest in making a generic
>>>> geneSet class for storing such information?  (Or does it already exist
>>>> and I just haven't seen it?)  I bring this up because I think it could
>>>> be quite useful to have a general solution for the community (like the
>>>> eSet class has become).  A class could be as simple as a vector of
>>>> Entrez Gene IDs to something more complicated (but perhaps a bit more
>>> useful for general consumption) like:
>>>> identifier: an identifier for the set (perhaps from a public database
>>>> like
>>>> MSigDB)
>>>> title:  One line title
>>>> description: free text description
>>>> species: The species to which the dataset applies
>>>> URL: from where the data were derived
>>>> MIAME: class "MIAME" object
>>>> protocol: (could be in MIAME, also) description of methods to produce
>>>> genelist from raw data source
>>>> idType:  What type of ID is stored (Entrez, Refseq, Ensembl, etc)?
>>>> geneList: vector of IDs
>>>>
>>>> A simple wrapper data structure (even just a list) could then be used
>>>> to distribute the geneSets.  Some methods could then be defined for
>>>> converting to an incidence matrix for use by Category, etc.  But I
>>>> think the most important points from above are 1) maintaining some
>>>> metadata about the genelists and 2) standardization to reduce
>>>> duplicated work.  Individual groups would then instantiate the
>>>> geneSets using whatever means they see fit (parsing MSigDB, IPI files,
>>> etc.).
>>>> Any thoughts?
>>>>
>>>> Sean
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> This email message, including any attachments, is for the so...{{dropped}}
>>>
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>> _______________________________________________
>>> Bioc-devel at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 
> --
> Robert Gentleman, PhD
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> PO Box 19024
> Seattle, Washington 98109-1024
> 206-667-7700
> rgentlem at fhcrc.org
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Bioc-devel mailing list
> Bioc-devel at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> 
> End of Bioc-devel Digest, Vol 36, Issue 12
> ******************************************
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org