[BioC] "romer"ing and "roast"ing around gene sets

Gordon K Smyth smyth at wehi.EDU.AU
Thu Jul 15 03:29:49 CEST 2010


Dear Robert,

I'm just adding briefly to Di's comments.

> From: "Robert M. Flight" <rflight79 at gmail.com>
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] "romer"ing and "roast"ing around gene sets
>
> Hi All,
>
> I am having trouble with the distinction between the functions "roast" 
> and "romer" in the limma package. From the publication describing 
> "roast" (http://dx.doi.org/10.1093/bioinformatics/btq401), it seems that 
> it tests a particular gene set for differential expression, whereas 
> "romer" tests a battery of sets to find those that are differentially 
> expressed compared to the rest?

Yes.

> I am really having trouble discerning the true difference between these 
> two, and how they compare to GSEA. I always thoght that the primary 
> purpose of GSEA was to determine those gene sets that are significantly 
> associated with a phenotypic comparison, i.e. those gene sets showing 
> differential expression.

This is an understandable assumption, which isn't quite true!  GSEA 
actually tries to pick out the sets that stand out as more strongly 
differentially expressed (DE) than others.  So, if all the sets were DE to 
exactly the same degree, then GSEA wouldn't find anything significant, 
because no set would stand out from the others.  This is actually a 
biologically well-motivated approach when you are testing large numbers of 
sets.

If you want to test every set in the MSigDB, then testing one by one with 
roast() would probably be just too slow anyway.  romer() is more efficient 
when the number of sets is very large.

Beware that romer(), like GSEA, tends to give pretty modest p-values. 
The ranking of the sets may be more useful than the absolute p-values.

Best wishes
Gordon

> If any one can help me clear this up, that would be great, because as of 
> now I am thoroughly confused. To me, if I have a dataset, and I want to 
> know which gene sets (from say MSigDB) are differentially expressed, 
> then it sounds like I would use "roast", but the way it is described in 
> the publication (and the help in limma), this isn't what I would do, but 
> rather I should use "romer", and see if any of the sets show 
> differential expression compared to the rest in the database.
>
> Color me confused,
>
> -Robert
>
> Robert M. Flight, Ph.D.
> Bioinformatics and Biomedical Computing Laboratory
> University of Louisville
> Louisville, KY
>
> PH 502-852-0467
> EM robert.flight at louisville.edu
> EM rflight79 at gmail.com
>
> Williams and Holland's Law:
> ? ? ?? If enough data is collected, anything may be proven by
> statistical methods.

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list