[BioC] pre-ranked GSEA within R? + Best DESeq2/limma-voom metric?

Sun Jan 19 07:07:42 CET 2014

Hi Jose,

Doing a one-sample t-test of the logFCs for a gene set is very similar to 
the test proposed by

  http://www.ncbi.nlm.nih.gov/pubmed/20048385

(Actually that paper did a z-test of t-statistics, and criticized the test 
of logFCs as ignoring unequal variances betweeen genes.]

You're right that this is different from GSEA, and in fact it has been 
criticized directly by the GSEA authors themselves:

  http://www.ncbi.nlm.nih.gov/pubmed/23070592

The point I was making in my previous email, is that the criticism made by 
the GSEA authors of the logFC test applies equally to the GSEA pre-ranked 
test.  So I suspect the GSEA authors would not recommend the pre-ranked 
test themselves unless there was no alternative (i.e., no replicates).

Best wishes
Gordon

--------- original message ------------
[BioC] pre-ranked GSEA within R? + Best DESeq2/limma-voom metric?
Garcia Manteiga Jose Manuel <garciamanteiga.josemanuel at hsr.it>
Fri Jan 17 17:52:01 CET 2014

Dear Mike,
Thanks for the confirmation, I remember talking to someone during the 
Bioc2013 lab saying that same thing on shrunken log2FC but I do not know 
why I thought that the value to use with pre-ranked was not the log2FC in 
the results object but another one due to the shrinking procedure.

On the other hand, on the workflow from the lab, I remember that in 5.1 
log2FC was used to test whether particular gene sets had an average log2FC 
different from 0. That would use log2FC information but in a different way 
compared to how I understand GSEA pre-ranked analysis (the procedure 
published by the people in Broad). In the Broad pre-ranked strategy, the 
enrichment using log2FCs from the top (positive and negative) is giving 
you an enrichment score that should be more related, in my view, to 
biological relevance since it is linked to highest log2FCs observed, in 
absolute values, whereas the workflow in 5.1 would flag those gene sets 
that do show a different behaviour but maybe very low just because we are 
mathematically able to say so (because of the t-test used).

By the way, could the thresholded test implementation in DESeq2 >1.3, or 
something similar, be used in a similar way to filter for those gene sets 
with a more relevant enrichment using the 5.1 workflow?
Thanks again for the efforts

Jose

On 17 Jan 2014, at 16:10, Michael Love <michaelisaiahlove at gmail.com> 
wrote:

hi Garcia,

For DESeq2, we do recommend using shrunken log fold changes for gene set 
tests, that is, the log2FoldChange column of a results object when the 
default DESeq2 pipeline has been used.

For a suggested workflow, see section 5.1 of the PDF for the presentation 
"DESeq2/DEXSeq / Alejandro Reyes, Wolfgang Huber", one of the Bioc2013 
labs: http://bioconductor.org/help/course-materials/2013/BioC2013/

Mike

On Fri, Jan 17, 2014 at 3:30 AM, Garcia Manteiga Jose Manuel 
<garciamanteiga.josemanuel at hsr.it> wrote:
Hi,
I wanted the same and this is what gsea people replied to me some months 
ago:

Hi,

Thank you for your interest in GSEA.

R GSEA does accept ranked list as an input.

Please note that R GSEA has not been actively maintained since 2005.

Regards,

I have not gone through the R GSEA again to check how pre-ranked lists can 
be accepted, but the last sentence in their message made me hold a minute 
and think in other possibilities.
For the moment I am still doing as you mentioned, using the rnk file with 
java, then getting back again from results of gsea to R.

By the way, I would like to known which metric should be best to be used 
for that kind of analysis when using RNA-Seq data coming from DESeq2 
analysis. log2FC? p-values? Are they considered to be weighted, as GSEA 
pre-ranked names them?
Thanks
Regards

On 17 Jan 2014, at 00:18, Daniel Schmolze <bioconductor at schmolze.com> 
wrote:

I want to do a GSEA entirely from within R, using genes ranked by my
own metric. At the moment I'm saving my ranked genes to a .rnk file,
then calling Broad's GSEA java program, then reading the resulting
output back into R (all I care about are the p-values in
gsea_report_for_na_pos_####.xls and gsea_report_for_na_neg_####.xls).
Cumbersome to say the least.

As far as I can tell, the Broad GSEA R script won't accept pre-ranked
genes, but maybe I'm wrong? If not, I'm interested in other options.
I'd like to specifically stick with the Broad GSEA algorithm if
possible.

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}