[BioC] gene set enrichment
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Dec 4 00:42:58 CET 2012
Hi Steve,
Thanks for correcting me.
I said that GSEA requires full data because this is true of the published
GSEA algorithm (Subramanian et al 2005). The published GSEA approach
permutes arrays and therefore requires all the data. I just forgot that
the GSEA software provides an alternative short-cut approach (permuting
genes) that can be used when there are no replicates or one just has a
ranked gene list.
The GSEA ranked gene list approach is similar in principle to the
geneSetTest() function in the limma package. This approach has the
disadvantage that it does not correct for intra-gene corrrelations, as we
pointed out in our recent camera paper (thanks to Tim Triche for giving
the reference).
However the same criticism (that intra-gene correlation is ignored) can be
made of all GO overlap analysis softwares as well including goseq. So the
only clear advantage of goseq over GSEA here is the adjustment for gene
length. As compensation, GSEA-ranked-list uses the rankings of the DE
genes that goseq ignores.
As you probably know, the whole area of gene set testing is a hot area of
research, and the inter-relationships between the many different
approaches are still imperfectly understood. Methods like geneSetTest and
GSEA-ranked-list are anti-conservative. Methods like roast, camera or
classic GSEA are conservative and safe. GO overlap analyses like goseq,
GOStat, DAVID etc are anti-conservative in principle but, in practice,
multiple testing conservatism tends to make them conservative. Different
approaches test different hypotheses and emphasise different aspects of
the data.
Best wishes
Gordon
On Sun, 2 Dec 2012, Steve Lianoglou wrote:
> Hi Gordon,
>
> When an expert comments on a topic I'm interested in, it's hard for me
> not to press for more insight so I hope you don't mind, but also ...
> you know .. take your time :-)
>
> On Sat, Dec 1, 2012 at 8:39 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> [snip]
>> The term "gene set enrichment analysis" was coined by the Broad Institute:
>>
>> http://www.broadinstitute.org/gsea/
>>
>> but you certainly can't simply give a list of genes to GSEA. It requires
>> complete data and is designed for microarrays rather than RNA-Seq anyway.
>
> I'm curious if you say so because GSEA doesn't account for something
> like length bias? The GSEA folks seem to suggest that one could do
> this like any other "pre-processed" GSEA analysis by simply providing
> a ranked list of genes (presumably by fold-change):
>
> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#Can_I_use_GSEA_to_analyze_SNP.2C_SAGE.2C_ChIP-Seq_or_RNA-Seq_data.3F
>
> Would you mind (briefly) elaborating a bit on why you disagree?
>
> Thanks,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list