[BioC] GAGE: question about interpretation of "ambiguous" results from geneset analysis
Luo Weijun
luo_weijun at yahoo.com
Mon Jan 31 18:54:16 CET 2011
Hi Nhan,
For 1-d perturbation, the sign and magnitude of the t-statistics (stat.mean column) indicate the overall change of a gene set. For 2-d perturbation, the t-statistics indicate the perturbation magnitude of a gene set.
We test 2-d perturbations almost the same way as 1-d, except the per gene statistics become absolute fold changes instead of fold changes. Because there are usually tens of genes in a gene set, the deviation from normal distribution of absolute fold change is not a concern here. Hope this helps.
Weijun
--- On Sun, 1/30/11, Nhan Thi Ho <nho at epi.msu.edu> wrote:
> From: Nhan Thi Ho <nho at epi.msu.edu>
> Subject: RE: GAGE: question about interpretation of "ambiguous" results from geneset analysis
> To: "Luo Weijun" <luo_weijun at yahoo.com>
> Date: Sunday, January 30, 2011, 1:03 PM
> Dear Dr Lou,
> Thank you very much for your quick response. It makes a lot
> of sense on dual significance issue. In fact, I did some
> plots of mean log2 fold change (of a significant geneset)
> for individual pairs and I somehow figured out in
> which pairs that geneset is upregulated or down regulated.
> For testing gene sets perturb in 1 direction, I guess we
> can look at the sign of the t-statistic (- or +). But
> I am still a little confused about the test you use
> for testing the gene set which perturbs in 2 directions. I
> read your paper and I could not figure out. Could you please
> give me more explanation about this (or show me where I can
> find explanation about this)?
> Thank you very much.
> Nhan
>
>
>
> ________________________________________
> From: Luo Weijun [luo_weijun at yahoo.com]
> Sent: Sunday, January 30, 2011 11:48 AM
> To: Nhan Thi Ho
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: GAGE: question about interpretation of
> "ambiguous" results from geneset analysis
>
> Hi Nhan Thi,
> Thanks for your interest in GAGE.
> I understand the issue you observed. You are right,
> normally we only see a gene set either up- or down-
> regulated. But when using GAGE for big datasets (like
> yours), significant for both up-regulation and
> down-regulation may occur to some gene sets. Because we got
> very small p-values for a subset of cases (vs control) in
> up-regulation test, and small p-values for another subset of
> cases in down-regulation test. In other words, for big
> datasets, GAGE identifies significant changes in subsets of
> samples, hence may call some gene sets both up- and down-
> regulated. We call such gene set "dual significant". Dual
> significance could be confusing to new users, but may
> indicate relevant results for subsets of samples or
> sub-classes of diseases. They are simple ways to handle
> these dual significant gene sets. You may only keep both
> directions, the more significant direction, or remove both
> directions depends on whether what want to see significant
> changes only in
> a subset of samples. Check help information for function
> sigGeneSet (?sigGeneSet). We will add more rigorous
> treatment of dual significance issue in the near future.
> If you want to know what subsets of samples are up- or
> down-regulated, you may want to output the full results
> table with full.table=T when calling gage function. This
> ways, you can see all the individual p-values. Let me know
> whether these explanations make sense. thanks!
> Weijun
>
> On 1/30/2011 9:31 AM, Nhan Thi Ho wrote:
> > Dear Dr Weinjun Lou,
> > I find your GAGE method is fascinating and I am using
> it to analyze our
> > microarray data. Our data are in pairs (21 pairs) so I
> guess so far,
> > your method is probably the most appropriate one to
> use.
> > However, I have some trouble in understanding the
> results and interprete
> > the results from the analysis.
> > 1) How can a pathway is both significantly up
> regulated and
> > significantly down regulated and then significantly
> perturbed in 2
> > directions? (For example, the ribosome pathway in the
> result output
> > below) (I copy and paste these from my PDF file thus
> the columns do not
> > aligne, I am sorry for that). From my superficial
> understanding, a gene
> > set perturbed in 2 directions is that: a group of
> genes in that set are
> > up regulated and another group of genes in the same
> set are down
> > regulated. Say, one gene set with 100 genes: 50 genes
> are up regulated
> > and 30 genes are down regulated and 20 genes are
> "equally" regulated.
> > When we look at that gene set in one direction only,
> we may find that
> > gene set significantly up regulated and may also find
> that gene set
> > significantly perturbed in 2 directions. However, it
> is probably not
> > convincing to say that gene get is significantly down
> regulated. Another
> > extreme example: if in a gene set: 50 genes up and 50
> genes down
> > regulated. So we may find that gene set significantly
> perturbed in 2
> > directions. But if we look at that gene set in one
> direction only, mean
> > of 50 up + 50 down should be close to 0 (when we do
> the t-test) =>
> > should not be significant for either up regulated or
> down regulated only?
> > 2) Example from the results below:
> > - For example the natural killer cell pathway belong
> to both top 10 up
> > and top 10 down regulated pathways. How should I
> interprete this?
> > - The ribosome is the top first pathway significantly
> up, significantly
> > down regulated and significantly perturbed in 2
> directions. How shouls I
> > interpret this? (In addition, is this an coincidence
> that the findings
> > from our data for ribosome are similar to the findings
> from the attached
> > data in your GAGE package?)
> > This is my first time using your method so I am still
> confused. Hope
> > that you could help me out with this.
> > Thank you very much and I am looking forward to
> hearing from you
> > Sincerely,
> > Nhan Thi Ho
> > //
> >
> > /> singleexpress.kegg.p <- gage(singleexpress,
> gsets = kegg.gs,/
> >
> > /+ ref = controlsingle, samp = casesingle)/
> >
> > //
> >
> > These are top 10 up-regulated pathways:
> >
> > //
> >
> > /> head(singleexpress.kegg.p$greater[, 1:5], 10)/
> >
> > //
> >
> > P.geomean stat.mean
> >
> > hsa03010 Ribosome 0.03610569 -0.17906864
> >
> > hsa05322 Systemic lupus erythematosus 0.13094870
> 0.52744334
> >
> > hsa04740 Olfactory transduction 0.20700933 0.18090395
> >
> > hsa04120 Ubiquitin mediated proteolysis 0.27998732
> 0.20452535
> >
> > hsa04630 Jak-STAT signaling pathway 0.28945564
> 0.14131421
> >
> > hsa04650 Natural killer cell mediated cytotoxicity
> 0.29300667 -0.15914825
> >
> > hsa04340 Hedgehog signaling pathway 0.29821667
> 0.23021041
> >
> > hsa05130 Pathogenic Escherichia coli infection - EHEC
> 0.29945402 0.02759186
> >
> > hsa05131 Pathogenic Escherichia coli infection - EPEC
> 0.29945402 0.02759186
> >
> > hsa01430 Cell junctions 0.30712834 0.13906001
> >
> > P.erlang q.BH
> >
> > hsa03010 Ribosome 2.172525e-12 3.823644e-10
> >
> > hsa05322 Systemic lupus erythematosus 8.708234e-05
> 7.663246e-03
> >
> > hsa04740 Olfactory transduction 1.012273e-02
> 5.938668e-01
> >
> > hsa04120 Ubiquitin mediated proteolysis 1.105111e-01
> 9.911829e-01
> >
> > hsa04630 Jak-STAT signaling pathway 1.372187e-01
> 9.911829e-01
> >
> > hsa04650 Natural killer cell mediated cytotoxicity
> 1.481668e-01 9.911829e-01
> >
> > hsa04340 Hedgehog signaling pathway 1.651384e-01
> 9.911829e-01
> >
> > hsa05130 Pathogenic Escherichia coli infection - EHEC
> 1.693260e-01
> > 9.911829e-01
> >
> > hsa05131 Pathogenic Escherichia coli infection - EPEC
> 1.693260e-01
> > 9.911829e-01
> >
> > hsa01430 Cell junctions 1.966093e-01 9.911829e-01
> >
> > These are top 10 down regulated pathways:
> >
> > //
> >
> > /> head(singleexpress.kegg.p$less[, 1:5], 10)/
> >
> > //
> >
> > P.geomean stat.mean
> >
> > hsa03010 Ribosome 0.01177051 -0.1790686
> >
> > hsa04670 Leukocyte transendothelial migration
> 0.17277427 -0.3564603
> >
> > hsa04810 Regulation of actin cytoskeleton 0.17792625
> -0.3781713
> >
> > hsa04210 Apoptosis 0.19036773 -0.3513636
> >
> > hsa04650 Natural killer cell mediated cytotoxicity
> 0.19685126 -0.1591483
> >
> > hsa05012 Parkinson s disease 0.22651285 -0.2328108
> >
> > hsa04620 Toll-like receptor signaling pathway
> 0.22856438 -0.4162079
> >
> > hsa00190 Oxidative phosphorylation 0.22860070
> -0.2035314
> >
> > hsa00030 Pentose phosphate pathway 0.25386354
> -0.4497014
> >
> > hsa04662 B cell receptor signaling pathway 0.25509455
> -0.1487690
> >
> > P.erlang q.BH
> >
> > hsa03010 Ribosome 3.981800e-20 7.007969e-18
> >
> > hsa04670 Leukocyte transendothelial migration
> 1.777562e-03 1.402332e-01
> >
> > hsa04810 Regulation of actin cytoskeleton 2.390339e-03
> 1.402332e-01
> >
> > hsa04210 Apoptosis 4.634289e-03 2.039087e-01
> >
> > hsa04650 Natural killer cell mediated cytotoxicity
> 6.367109e-03 2.241222e-01
> >
> > hsa05012 Parkinson s disease 2.222453e-02
> 5.280020e-01
> >
> > hsa04620 Toll-like receptor signaling pathway
> 2.396829e-02 5.280020e-01
> >
> > hsa00190 Oxidative phosphorylation 2.400009e-02
> 5.280020e-01
> >
> > hsa00030 Pentose phosphate pathway 5.514901e-02
> 9.737468e-01
> >
> > hsa04662 B cell receptor signaling pathway
> 5.718567e-02 9.737468e-01
> >
> > To capture pathways perturbed towards both
> directions:
> >
> > //
> >
> > /> singleexpress.kegg.2d.p <-
> gage(singleexpress, gsets = kegg.gs,/
> >
> > /+ ref = controlsingle, samp = casesingle, same.dir =
> F)/
> >
> > /> head(singleexpress.kegg.2d.p[, 1:5], 10)/
> >
> > //
> >
> > P.geomean stat.mean
> >
> > hsa03010 Ribosome 0.01762569 1.39873089
> >
> > hsa04740 Olfactory transduction 0.22888986 0.30126007
> >
> > hsa05322 Systemic lupus erythematosus 0.26554405
> 0.27810943
> >
> > hsa05130 Pathogenic Escherichia coli infection - EHEC
> 0.27370453 0.26596493
> >
> > hsa05131 Pathogenic Escherichia coli infection - EPEC
> 0.27370453 0.26596493
> >
> > hsa05012 Parkinson s disease 0.29885705 0.25976816
> >
> > hsa00190 Oxidative phosphorylation 0.31563344
> 0.22062642
> >
> > hsa00910 Nitrogen metabolism 0.33383781 0.29670300
> >
> > hsa00860 Porphyrin and chlorophyll metabolism
> 0.34280781 0.22833195
> >
> > hsa04612 Antigen processing and presentation
> 0.34865262 0.05332788
> >
> > P.erlang q.BH
> >
> > hsa03010 Ribosome 2.926412e-17 5.150485e-15
> >
> > hsa04740 Olfactory transduction 2.425440e-02
> 1.000000e+00
> >
> > hsa05322 Systemic lupus erythematosus 7.668010e-02
> 1.000000e+00
> >
> > hsa05130 Pathogenic Escherichia coli infection - EHEC
> 9.478160e-02
> > 1.000000e+00
> >
> > hsa05131 Pathogenic Escherichia coli infection - EPEC
> 9.478160e-02
> > 1.000000e+00
> >
> > hsa05012 Parkinson s disease 1.672982e-01
> 1.000000e+00
> >
> > hsa00190 Oxidative phosphorylation 2.293692e-01
> 1.000000e+00
> >
> > hsa00910 Nitrogen metabolism 3.072774e-01
> 1.000000e+00
> >
> > hsa00860 Porphyrin and chlorophyll metabolism
> 3.488027e-01 1.000000e+00
> >
> > hsa04612 Antigen processing and presentation
> 3.766759e-01 1.000000e+00
> >
More information about the Bioconductor
mailing list