[BioC] GAGE: question about interpretation of "ambiguous" results from geneset analysis

Luo Weijun luo_weijun at yahoo.com
Sun Jan 30 17:48:10 CET 2011


Hi Nhan Thi,
Thanks for your interest in GAGE.
I understand the issue you observed. You are right, normally we only see a gene set either up- or down- regulated. But when using GAGE for big datasets (like yours), significant for both up-regulation and down-regulation may occur to some gene sets. Because we got very small p-values for a subset of cases (vs control) in up-regulation test, and small p-values for another subset of cases in down-regulation test. In other words, for big datasets, GAGE identifies significant changes in subsets of samples, hence may call some gene sets both up- and down- regulated. We call such gene set "dual significant". Dual significance could be confusing to new users, but may indicate relevant results for subsets of samples or sub-classes of diseases. They are simple ways to handle these dual significant gene sets. You may only keep both directions, the more significant direction, or remove both directions depends on whether what want to see significant changes only in
 a subset of samples. Check help information for function sigGeneSet (?sigGeneSet). We will add more rigorous treatment of dual significance issue in the near future.
If you want to know what subsets of samples are up- or down-regulated, you may want to output the full results table with full.table=T when calling gage function. This ways, you can see all the individual p-values. Let me know whether these explanations make sense. thanks!
Weijun

On 1/30/2011 9:31 AM, Nhan Thi Ho wrote:
> Dear Dr Weinjun Lou,
> I find your GAGE method is fascinating and I am using it to analyze our 
> microarray data. Our data are in pairs (21 pairs) so I guess so far, 
> your method is probably the most appropriate one to use.
> However, I have some trouble in understanding the results and interprete 
> the results from the analysis.
> 1) How can a pathway is both significantly up regulated and 
> significantly down regulated and then significantly perturbed in 2 
> directions? (For example, the ribosome pathway in the result output 
> below) (I copy and paste these from my PDF file thus the columns do not 
> aligne, I am sorry for that). From my superficial understanding, a gene 
> set perturbed in 2 directions is that: a group of genes in that set are 
> up regulated and another group of genes in the same set are down 
> regulated. Say, one gene set with 100 genes: 50 genes are up regulated 
> and 30 genes are down regulated and 20 genes are "equally" regulated. 
> When we look at that gene set in one direction only, we may find that 
> gene set significantly up regulated and may also find that gene set 
> significantly perturbed in 2 directions. However, it is probably not 
> convincing to say that gene get is significantly down regulated. Another 
> extreme example: if in a gene set: 50 genes up and 50 genes down 
> regulated. So we may find that gene set significantly perturbed in 2 
> directions. But if we look at that gene set in one direction only, mean 
> of 50 up + 50 down should be close to 0 (when we do the t-test) => 
> should not be significant for either up regulated or down regulated only?
> 2) Example from the results below:
> - For example the natural killer cell pathway belong to both top 10 up 
> and top 10 down regulated pathways. How should I interprete this?
> - The ribosome is the top first pathway significantly up, significantly 
> down regulated and significantly perturbed in 2 directions. How shouls I 
> interpret this? (In addition, is this an coincidence that the findings 
> from our data for ribosome are similar to the findings from the attached 
> data in your GAGE package?)
> This is my first time using your method so I am still confused. Hope 
> that you could help me out with this.
> Thank you very much and I am looking forward to hearing from you
> Sincerely,
> Nhan Thi Ho
> //
> 
> /> singleexpress.kegg.p <- gage(singleexpress, gsets = kegg.gs,/
> 
> /+ ref = controlsingle, samp = casesingle)/
> 
> //
> 
> These are top 10 up-regulated pathways:
> 
> //
> 
> /> head(singleexpress.kegg.p$greater[, 1:5], 10)/
> 
> //
> 
> P.geomean stat.mean
> 
> hsa03010 Ribosome 0.03610569 -0.17906864
> 
> hsa05322 Systemic lupus erythematosus 0.13094870 0.52744334
> 
> hsa04740 Olfactory transduction 0.20700933 0.18090395
> 
> hsa04120 Ubiquitin mediated proteolysis 0.27998732 0.20452535
> 
> hsa04630 Jak-STAT signaling pathway 0.28945564 0.14131421
> 
> hsa04650 Natural killer cell mediated cytotoxicity 0.29300667 -0.15914825
> 
> hsa04340 Hedgehog signaling pathway 0.29821667 0.23021041
> 
> hsa05130 Pathogenic Escherichia coli infection - EHEC 0.29945402 0.02759186
> 
> hsa05131 Pathogenic Escherichia coli infection - EPEC 0.29945402 0.02759186
> 
> hsa01430 Cell junctions 0.30712834 0.13906001
> 
> P.erlang q.BH
> 
> hsa03010 Ribosome 2.172525e-12 3.823644e-10
> 
> hsa05322 Systemic lupus erythematosus 8.708234e-05 7.663246e-03
> 
> hsa04740 Olfactory transduction 1.012273e-02 5.938668e-01
> 
> hsa04120 Ubiquitin mediated proteolysis 1.105111e-01 9.911829e-01
> 
> hsa04630 Jak-STAT signaling pathway 1.372187e-01 9.911829e-01
> 
> hsa04650 Natural killer cell mediated cytotoxicity 1.481668e-01 9.911829e-01
> 
> hsa04340 Hedgehog signaling pathway 1.651384e-01 9.911829e-01
> 
> hsa05130 Pathogenic Escherichia coli infection - EHEC 1.693260e-01 
> 9.911829e-01
> 
> hsa05131 Pathogenic Escherichia coli infection - EPEC 1.693260e-01 
> 9.911829e-01
> 
> hsa01430 Cell junctions 1.966093e-01 9.911829e-01
> 
> These are top 10 down regulated pathways:
> 
> //
> 
> /> head(singleexpress.kegg.p$less[, 1:5], 10)/
> 
> //
> 
> P.geomean stat.mean
> 
> hsa03010 Ribosome 0.01177051 -0.1790686
> 
> hsa04670 Leukocyte transendothelial migration 0.17277427 -0.3564603
> 
> hsa04810 Regulation of actin cytoskeleton 0.17792625 -0.3781713
> 
> hsa04210 Apoptosis 0.19036773 -0.3513636
> 
> hsa04650 Natural killer cell mediated cytotoxicity 0.19685126 -0.1591483
> 
> hsa05012 Parkinson s disease 0.22651285 -0.2328108
> 
> hsa04620 Toll-like receptor signaling pathway 0.22856438 -0.4162079
> 
> hsa00190 Oxidative phosphorylation 0.22860070 -0.2035314
> 
> hsa00030 Pentose phosphate pathway 0.25386354 -0.4497014
> 
> hsa04662 B cell receptor signaling pathway 0.25509455 -0.1487690
> 
> P.erlang q.BH
> 
> hsa03010 Ribosome 3.981800e-20 7.007969e-18
> 
> hsa04670 Leukocyte transendothelial migration 1.777562e-03 1.402332e-01
> 
> hsa04810 Regulation of actin cytoskeleton 2.390339e-03 1.402332e-01
> 
> hsa04210 Apoptosis 4.634289e-03 2.039087e-01
> 
> hsa04650 Natural killer cell mediated cytotoxicity 6.367109e-03 2.241222e-01
> 
> hsa05012 Parkinson s disease 2.222453e-02 5.280020e-01
> 
> hsa04620 Toll-like receptor signaling pathway 2.396829e-02 5.280020e-01
> 
> hsa00190 Oxidative phosphorylation 2.400009e-02 5.280020e-01
> 
> hsa00030 Pentose phosphate pathway 5.514901e-02 9.737468e-01
> 
> hsa04662 B cell receptor signaling pathway 5.718567e-02 9.737468e-01
> 
> To capture pathways perturbed towards both directions:
> 
> //
> 
> /> singleexpress.kegg.2d.p <- gage(singleexpress, gsets = kegg.gs,/
> 
> /+ ref = controlsingle, samp = casesingle, same.dir = F)/
> 
> /> head(singleexpress.kegg.2d.p[, 1:5], 10)/
> 
> //
> 
> P.geomean stat.mean
> 
> hsa03010 Ribosome 0.01762569 1.39873089
> 
> hsa04740 Olfactory transduction 0.22888986 0.30126007
> 
> hsa05322 Systemic lupus erythematosus 0.26554405 0.27810943
> 
> hsa05130 Pathogenic Escherichia coli infection - EHEC 0.27370453 0.26596493
> 
> hsa05131 Pathogenic Escherichia coli infection - EPEC 0.27370453 0.26596493
> 
> hsa05012 Parkinson s disease 0.29885705 0.25976816
> 
> hsa00190 Oxidative phosphorylation 0.31563344 0.22062642
> 
> hsa00910 Nitrogen metabolism 0.33383781 0.29670300
> 
> hsa00860 Porphyrin and chlorophyll metabolism 0.34280781 0.22833195
> 
> hsa04612 Antigen processing and presentation 0.34865262 0.05332788
> 
> P.erlang q.BH
> 
> hsa03010 Ribosome 2.926412e-17 5.150485e-15
> 
> hsa04740 Olfactory transduction 2.425440e-02 1.000000e+00
> 
> hsa05322 Systemic lupus erythematosus 7.668010e-02 1.000000e+00
> 
> hsa05130 Pathogenic Escherichia coli infection - EHEC 9.478160e-02 
> 1.000000e+00
> 
> hsa05131 Pathogenic Escherichia coli infection - EPEC 9.478160e-02 
> 1.000000e+00
> 
> hsa05012 Parkinson s disease 1.672982e-01 1.000000e+00
> 
> hsa00190 Oxidative phosphorylation 2.293692e-01 1.000000e+00
> 
> hsa00910 Nitrogen metabolism 3.072774e-01 1.000000e+00
> 
> hsa00860 Porphyrin and chlorophyll metabolism 3.488027e-01 1.000000e+00
> 
> hsa04612 Antigen processing and presentation 3.766759e-01 1.000000e+00
>



More information about the Bioconductor mailing list