[BioC] gage package Re: gene set enrichment analysis with missing values
Luo Weijun
luo_weijun at yahoo.com
Fri Nov 5 15:36:16 CET 2010
Hi Heyi
Before gene set differential expression tests, we calculate differential expression statistics (fold change or signal-to-noise ratio etc) for each gene. While other methods do group-on-group comparison (compare the whole experiment sample group vs the whole control group) in this step, hence missing value in any sample(s) may affect the calculation of per gene differential expression statistics. GAGE compares one experimental sample to one control sample at a time. Hence any missing expression value will produce NA fold change in that particular pair-wise comparison only but does not affect fold changes in other pair-wise comparisons. Meanwhile, for any particular pair-wise comparison, the produced NA fold change will be omitted in the gene set test hence will usually make little difference as long as we have enough effective genes in a gene set. Hope that helps.
Weijun
--- On Thu, 11/4/10, heyi xiao <xiaoheyiyh at yahoo.com> wrote:
From: heyi xiao <xiaoheyiyh at yahoo.com>
Subject: Re: gage package Re: gene set enrichment analysis with missing values
To: bioconductor at stat.math.ethz.ch, "Luo Weijun" <luo_weijun at yahoo.com>
Date: Thursday, November 4, 2010, 1:18 PM
Hi Weijun
GAGE seems to be the exact method I am looking for. I’ve download gage package, and I am trying it out.
Could you explain briefly how GAGE handles the missing values? Thanks for the help!
Heyi
--- On Wed, 11/3/10, Luo Weijun <luo_weijun at yahoo.com> wrote:
From: Luo Weijun <luo_weijun at yahoo.com>
Subject: gage package Re: gene set enrichment analysis with missing values
To: bioconductor at stat.math.ethz.ch, "heyi xiao" <xiaoheyiyh at yahoo.com>
Date: Wednesday, November 3, 2010, 11:19 PM
Hi Heyi,
You may want to try the GAGE method. GAGE
does differential expression tests on gene sets based on one-on-one comparison between
samples. This special approach together with carefully designed NA handling utility
makes GAGE tolerant to missing values (NAs). You don’t really have to remove
genes with missing values. Actually it is better not removing genes with
missing values, as the existent expression values for these genes can be fully
used to make the analysis more sensitive.
The gage package is newly available
with bioconductor 2.7 at http://bioconductor.org/help/bioc-views/release/bioc/html/gage.html.
GAGE method has been published at http://www.biomedcentral.com/1471-2105/10/161.
Let me know if you have other questions or need help. Thanks!
Weijun
--- On Wed, 11/3/10, heyi xiao <xiaoheyiyh at yahoo.com> wrote:
From: heyi xiao <xiaoheyiyh at yahoo.com>
Subject: gene set enrichment analysis with missing values
To: bioconductor at stat.math.ethz.ch
Date: Wednesday, November 3, 2010, 10:19 PM
Dear all,
I have an expression data matrix with
genes as rows and samples as columns. Many genes (~30%) have missing values in
one or more samples. I would like to do a gene set enrichment type of analysis.
Shall I remove the whole rows for all these genes? I am a little concerned that
this may affect the testing power when so many genes are missing from the analysis.
Is there any better way to go? Any suggestions would be appreciate. Thank you!
Heyi
More information about the Bioconductor
mailing list