[BioC] pre-ranked GSEA within R? + Best DESeq2/limma-voom metric?
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Jan 19 07:07:42 CET 2014
Hi Jose,
Doing a one-sample t-test of the logFCs for a gene set is very similar to
the test proposed by
http://www.ncbi.nlm.nih.gov/pubmed/20048385
(Actually that paper did a z-test of t-statistics, and criticized the test
of logFCs as ignoring unequal variances betweeen genes.]
You're right that this is different from GSEA, and in fact it has been
criticized directly by the GSEA authors themselves:
http://www.ncbi.nlm.nih.gov/pubmed/23070592
The point I was making in my previous email, is that the criticism made by
the GSEA authors of the logFC test applies equally to the GSEA pre-ranked
test. So I suspect the GSEA authors would not recommend the pre-ranked
test themselves unless there was no alternative (i.e., no replicates).
Best wishes
Gordon
--------- original message ------------
[BioC] pre-ranked GSEA within R? + Best DESeq2/limma-voom metric?
Garcia Manteiga Jose Manuel <garciamanteiga.josemanuel at hsr.it>
Fri Jan 17 17:52:01 CET 2014
Dear Mike,
Thanks for the confirmation, I remember talking to someone during the
Bioc2013 lab saying that same thing on shrunken log2FC but I do not know
why I thought that the value to use with pre-ranked was not the log2FC in
the results object but another one due to the shrinking procedure.
On the other hand, on the workflow from the lab, I remember that in 5.1
log2FC was used to test whether particular gene sets had an average log2FC
different from 0. That would use log2FC information but in a different way
compared to how I understand GSEA pre-ranked analysis (the procedure
published by the people in Broad). In the Broad pre-ranked strategy, the
enrichment using log2FCs from the top (positive and negative) is giving
you an enrichment score that should be more related, in my view, to
biological relevance since it is linked to highest log2FCs observed, in
absolute values, whereas the workflow in 5.1 would flag those gene sets
that do show a different behaviour but maybe very low just because we are
mathematically able to say so (because of the t-test used).
By the way, could the thresholded test implementation in DESeq2 >1.3, or
something similar, be used in a similar way to filter for those gene sets
with a more relevant enrichment using the 5.1 workflow?
Thanks again for the efforts
Jose
On 17 Jan 2014, at 16:10, Michael Love <michaelisaiahlove at gmail.com>
wrote:
hi Garcia,
For DESeq2, we do recommend using shrunken log fold changes for gene set
tests, that is, the log2FoldChange column of a results object when the
default DESeq2 pipeline has been used.
For a suggested workflow, see section 5.1 of the PDF for the presentation
"DESeq2/DEXSeq / Alejandro Reyes, Wolfgang Huber", one of the Bioc2013
labs: http://bioconductor.org/help/course-materials/2013/BioC2013/
Mike
On Fri, Jan 17, 2014 at 3:30 AM, Garcia Manteiga Jose Manuel
<garciamanteiga.josemanuel at hsr.it> wrote:
Hi,
I wanted the same and this is what gsea people replied to me some months
ago:
Hi,
Thank you for your interest in GSEA.
R GSEA does accept ranked list as an input.
Please note that R GSEA has not been actively maintained since 2005.
Regards,
I have not gone through the R GSEA again to check how pre-ranked lists can
be accepted, but the last sentence in their message made me hold a minute
and think in other possibilities.
For the moment I am still doing as you mentioned, using the rnk file with
java, then getting back again from results of gsea to R.
By the way, I would like to known which metric should be best to be used
for that kind of analysis when using RNA-Seq data coming from DESeq2
analysis. log2FC? p-values? Are they considered to be weighted, as GSEA
pre-ranked names them?
Thanks
Regards
On 17 Jan 2014, at 00:18, Daniel Schmolze <bioconductor at schmolze.com>
wrote:
I want to do a GSEA entirely from within R, using genes ranked by my
own metric. At the moment I'm saving my ranked genes to a .rnk file,
then calling Broad's GSEA java program, then reading the resulting
output back into R (all I care about are the p-values in
gsea_report_for_na_pos_####.xls and gsea_report_for_na_neg_####.xls).
Cumbersome to say the least.
As far as I can tell, the Broad GSEA R script won't accept pre-ranked
genes, but maybe I'm wrong? If not, I'm interested in other options.
I'd like to specifically stick with the Broad GSEA algorithm if
possible.
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list