[BioC] "romer"ing and "roast"ing around gene sets

Fri Jul 16 19:01:28 CEST 2010

Dear Gordon,

Reading your email, I think there is something I am not following completely. 
You say, regarding the GSEA-like approach in "romer"

> This is actually a
> biologically well-motivated approach when you are testing large numbers of
> sets.
> 
> If you want to test every set in the MSigDB, then testing one by one with
> roast() would probably be just too slow anyway.  romer() is more efficient
> when the number of sets is very large.

What I found very attractive about roast is that the differential expression 
test is done for groups of genes so, in addition to possible increases in 
power, interpretation is simplified (e.g., if we use all the GO categories, we 
deal only with ~ 1500 entities).  Even if the examples in your Bioinformatics 
paper involve just a few sets, I was thinking about systematically using roast 
in, say, all GO categories, or all the 690 canonical pathways.

Moreover, if we want to use the "focused gene testing", even if roast takes 
longer, I do not see how the larger efficiency of romer would make it an 
alternative procedure: they are answering different questions, right?

But now, I am starting to think that maybe the idea of systematically testing 
all 1500 go categories might be a bad idea.

Best,

R.

P.S. The help for roast says y it must be a numeric matrix. But I think it 
works fine with ExpressionSet objects directly, too.

On Thursday 15 July 2010 03:29:49 Gordon K Smyth wrote:
> Dear Robert,
> 
> I'm just adding briefly to Di's comments.
> 
> > From: "Robert M. Flight" <rflight79 at gmail.com>
> > To: bioconductor at stat.math.ethz.ch
> > Subject: [BioC] "romer"ing and "roast"ing around gene sets
> >
> > Hi All,
> >
> > I am having trouble with the distinction between the functions "roast"
> > and "romer" in the limma package. From the publication describing
> > "roast" (http://dx.doi.org/10.1093/bioinformatics/btq401), it seems that
> > it tests a particular gene set for differential expression, whereas
> > "romer" tests a battery of sets to find those that are differentially
> > expressed compared to the rest?
> 
> Yes.
> 
> > I am really having trouble discerning the true difference between these
> > two, and how they compare to GSEA. I always thoght that the primary
> > purpose of GSEA was to determine those gene sets that are significantly
> > associated with a phenotypic comparison, i.e. those gene sets showing
> > differential expression.
> 
> This is an understandable assumption, which isn't quite true!  GSEA
> actually tries to pick out the sets that stand out as more strongly
> differentially expressed (DE) than others.  So, if all the sets were DE to
> exactly the same degree, then GSEA wouldn't find anything significant,
> because no set would stand out from the others.  This is actually a
> biologically well-motivated approach when you are testing large numbers of
> sets.
> 
> If you want to test every set in the MSigDB, then testing one by one with
> roast() would probably be just too slow anyway.  romer() is more efficient
> when the number of sets is very large.
> 
> Beware that romer(), like GSEA, tends to give pretty modest p-values.
> The ranking of the sets may be more useful than the absolute p-values.
> 
> Best wishes
> Gordon
> 
> > If any one can help me clear this up, that would be great, because as of
> > now I am thoroughly confused. To me, if I have a dataset, and I want to
> > know which gene sets (from say MSigDB) are differentially expressed,
> > then it sounds like I would use "roast", but the way it is described in
> > the publication (and the help in limma), this isn't what I would do, but
> > rather I should use "romer", and see if any of the sets show
> > differential expression compared to the rest in the database.
> >
> > Color me confused,
> >
> > -Robert
> >
> > Robert M. Flight, Ph.D.
> > Bioinformatics and Biomedical Computing Laboratory
> > University of Louisville
> > Louisville, KY
> >
> > PH 502-852-0467
> > EM robert.flight at louisville.edu
> > EM rflight79 at gmail.com
> >
> > Williams and Holland's Law:
> > ? ? ?? If enough data is collected, anything may be proven by
> > statistical methods.
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:24}}