[BioC] geneSetTest() / GESA
Gordon Smyth
smyth at wehi.EDU.AU
Tue Mar 6 00:38:08 CET 2007
Dear Simon,
geneSetTest() is very fast if you use the default settings. In that
case it's a closed form calculation. It's intended to use with
individual gene sets and has no problem with small gene sets. It's
usable down to size=1.
GSEA and especially GSA are very sophisticated methods which use
permutation over arrays as well as standardization over genes to
control for possible dependence between the genes in the test set.
I'm not an expert on either method, but they seem intended for
two-sample situations with at least half a dozen arrays in each
group, many gene sets, and many genes in each set.
geneSetTest() is a far simpler (hence more flexible) approach which
is aimed at a class of problems that we see regularly at the WEHI.
Here the aim is to relate a gene ranking, usually achieved by fitting
a linear model, to a prior set of genes of special interest. It's
based on permuting the genes, not the arrays. The default method is
simply a Wilcoxon test using the ranks of the genes. The caveat of
geneSetTest() is that significance can arise theoretically from high
correlations between genes in the test set rather than a shift in the
mean, so this possibility should ideally be checked or ruled out separately.
Best wishes
Gordon
At 10:00 PM 5/03/2007, bioconductor-request at stat.math.ethz.ch wrote:
>Date: Sun, 4 Mar 2007 12:46:19 -0600
>From: "Simon Lin" <simonlin at duke.edu>
>Subject: Re: [BioC] geneSetTest() / GESA
>To: <bioconductor at stat.math.ethz.ch>
>
>Dear Gordon,
>
>Is the geneSetTest() fast to calculate? Not sure if you used permutation
>test under the hood.
>
>For GSEA and GSA, sometimes we see artifacts when the size of the set is too
>small. Is the same true for geneSetTest?
>
>Thanks!
>
>Simon
>
>
>Date: Sun, 04 Mar 2007 18:51:00 +1100
>From: Gordon Smyth <smyth at wehi.EDU.AU>
>Subject: [BioC] GSEA with one class metaanalysis
>To: Mark W Kimpel <mwkimpel at gmail.com>
>Cc: bioconductor at stat.math.ethz.ch
>Message-ID: <6.2.5.6.1.20070304184303.0242d7a0 at wehi.edu.au>
>Content-Type: text/plain; charset="us-ascii"; format=flowed
>
>Dear Mark,
>
>If I understand your problem correctly, neither GSEA nor GSA will
>accomodate it. The only option I know of is geneSetTest() in the
>limma package. This generally works well, although it will give you
>someone over optimistic p-values if there are strong positive
>correlations between the genes in your gene sets.
>
>Best wishes
>Gordon
More information about the Bioconductor
mailing list