[BioC] GAGE and PATHVIEW packages

Mon Oct 7 02:11:25 CEST 2013

Hi Christian,
Please see my point-to-point answers below.
HTHs,
Weijun

--------------------------------------------
On Fri, 10/4/13, Christian De Santis <christian.desantis at stir.ac.uk> wrote:

 Subject: GAGE and PATHVIEW packages

.org" <bioconductor at r-project.org>
 Date: Friday, October 4, 2013, 11:27 AM

 Dear Luo and list,

> I am successfully using GAGE and pathview for my
 analyses and I like the package a lot. So, thanks for
 developing it.  I have some points on which I would
 appreciate some help and/or clarification. 

Thanks for the comments.

> AVERAGE VALUE - The first time I run the analysis with
 GAGE, I used an identical setup parameters as the example
 prepared by you in the manual. I have 8 replicates per
 treatment and I initially used unique column names for each
 sample (i.e. “DIET02_1,
 DIET02_2, DIET02_3, etc.) as per your example with HN and
 DCIS. However, I have discovered (following a casual
 mistake) that if instead of having a unique name samples are
 named with the treatments they belong (i.e.
 “DIET02” for all 8 replicates), the subsequent
 gage analysis it generates one single value for that
 treatment. By comparing the p values of both the above cases
 I have found that they are identical. Am I correct to assume
 that in the latter case every value assigned to the
 treatment are an average of the
 replicates?

It is the average, i.e. p-value is the genometric mean, while statistics is the mean of the columns with the same name. The average mechanism is there to accomdate special needs or mistakes, but it is not recommended to use the same name for replicate samples.

> DUPLICATE PROBES – My array has got several
 duplicate or triplicate probes which are correctly annotated
 with the same KO number. How are these probes handled by the
 gage analysis? For example, if I have three probes for my
 gene X which are annotated with
 the same KO number, are these going to be counted 3 times
 into the “set size”? Or are the values for that
 KO number going to be merged into one?

Duplicate probes will be count for multiple times, which is not good. Because gene set analysis like GAGE really assume one independent variable per gene. You may summarize over duplicate probes before feed into GAGE. You can check ?mol.sum in pathview package for that.

> “COMPARE” argument of “gage”
 function – My experiment consists of 5 treatments (x 8
 replicates). None of the treatments is a proper
 “control”. Is it correct if I use as an argument
 “1ongroup” choosing one of the treatment as a
 ref? I have also tried the
 “as.group” option but when I look at the results
 I do not get a comparison of the chosen reference with the
 remaining groups, but instead one single value named
 “exp1”. I have also tried “paired”
 which gives completely different results. 

If you set ref or samp other than NULL, GAGE assume it is a two state comparison. Compare argument may assume one value of 1ongrp, paired, unpaired, as.group based on needs. They are all for two state comparison, but to do it based on whether you samples are paired or not etc. If you want to do multiple state comparison/test, you should do before GAGE on each gene, then feed the single-column results into gage with “ref = NULL, samp = NULL”. If you want to do a two-state comparison, you should specify a control state, either all 4 groups other than your inntersting group, or the median of all groups for each gene. 

> HEATMAP OUTPUT of “esset.grp” function
 – Is there any quick way to generate an output heatmap
 (as for sigGeneSet) removing the redundant pathways
 identified with function “esset.grp”? At the
 moment I am doing this manually and plotting the results
 into
 heatmap.2 from gplot. Is this the only way?

You can do this quickly using esset.grp+ sigGeneSet, assuming you follow the examples till you get gse16873.kegg.esg.up and gse16873.kegg.esg.dn:
ess.sets=c(gse16873.kegg.esg.up$essentialSets, gse16873.kegg.esg.dn$essentialSets)
gse16873.kegg.p.ess=lapply(gse16873.kegg.p, function(x) x[ess.sets,])
gse16873.kegg.sig.ess=sigGeneSet(gse16873.kegg.p.ess, outname="gse16873.kegg.ess")

  Any help on the above would be greatly
 appreciated.

 Regards.
 Christian De Santis

 The University
 of Stirling has been ranked in the top 12 of UK universities
 for graduate employment*.
 94% of
 our 2012 graduates were in work and/or further study within
 six months of graduation.
 *The
 Telegraph
 The University of
 Stirling is a charity registered in Scotland, number SC
 011159.