[BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files

Fri Jan 20 02:54:06 CET 2012

Hi Jung,

Thank you for sending your files but there is no need to attach the 
source files from the gage package (GAGE.r, gage.pdf). I have access to 
those files.

The package vignette is just intended to be an example. Clearly the data 
in the package and your data will be very different. It does not make 
sense to try to follow the code exactly "as is" when using your data. 
For example, it doesn't make sense for you to grep for 'HN', 'ADH' and 
'DCIS' since they don't exist in your file. These are treatment groups 
included in the gage sample data and have no bearing on your analysis. 
This is why you see nothing (i.e., integer(0)) for these variables.

 > Micro_array_dataset <- read.table("Micro_array_dataset.txt")
 > cn=colnames(Micro_array_dataset)
 > hn=grep('HN',cn, ignore.case =T)
 > adh=grep('ADH',cn, ignore.case =T)
 > dcis=grep('DCIS',cn, ignore.case =T)
 > print(hn)
integer(0)
 > print(dcis)
integer(0)

This error is due to the fact that you are subsetting a data.frame and 
have not specified the columns. In the vignette, the gene set is a list 
so this subsetting works.

 > lapply(Gene_set[1:3],head)
Error in `[.data.frame`(Gene_set, 1:3) : undefined columns selected

Next, your genes need to be grouped by pathway. The idea is to do an 
analysis of gene pathways so you need to provide a list of genes grouped 
by pathway (like the kegg.gs or go.gs example files in the vignette).  
Your gene file consists only of gene names,

 > head(rownames(Micro_array_dataset))
[1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457"
[5] "ENSG00000000460" "ENSG00000000938"

In R, a list of genes grouped by pathway would look like something like 
this,
 > head(kegg.gs)
$`hsa00010 Glycolysis / Gluconeogenesis`
  [1] "10327"  "124"    "125"    "126"    "127"    "128"    "130"    
"130589"
  [9] "131"    "160287" "1737"   "1738"   "2023"   "2026"   "2027"   "217"
...

$`hsa00020 Citrate cycle (TCA cycle)`
  [1] "1431"   "1737"   "1738"   "1743"   "2271"   "283398" "3417"   "3418"
  [9] "3419"   "3420"   "3421"   "4190"   "4191"   "47"     "48"     "4967"
...

You need to identify what pathways you are interested and group the 
genes by those pathways. For identifying pathways take a look at the 
GO.db, KEGG.db or reactome.db. Mapping between gene identifiers can be 
done with the org.*.db packages.

     http://www.bioconductor.org/packages/release/data/annotation/

Some general background on using Bioconductor annotation data is here,

http://www.bioconductor.org/help/workflows/annotation-data/#annotation-resources

Valerie

On 01/17/12 12:51, Javerjung Sandhu wrote:
> Hello Valerie,
> Thanks for your help. I am sending you the data 
> files(Micro_array_dataset.txt** & Gene_Set.txt) which i want to use 
> for the analysis.
> I need to know in which format the files should be saved (like 
> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats
> this site explains in great detail, what should be the format of the 
> data files required for GSEA analysis (though i am not using GSEA 
> analysis or these file types), same way i want to know in which format 
> i should save the data files required for GAGE analysis so that the 
> analysis is done properly)
> Please tell me which information is missing from these files.
> * Yes i know that "gse16873" is expression data and "kegg.gs" is a 
> geneset but i want to use my own, these ones are provided by the author.
> 1) What i want to accomplish is: I want to do a basic gage analysis 
> (as given in the R script file named "GAGE.r" and pdf file "gage.pdf") 
> such as t-test, rank test, KS test etc.
> 2) I copied the begining code(to make sure that it loads all the files 
> successfully) from R script file provided by the author (which is also 
> attached as GAGE.r)  and made some changes to it and saved as my own 
> script (also attached as Gage_run.r). I tried to load the data files 
> (Micro_array_dataset.txt & Gene_Set.txt) and got these errors (shown 
> in "R Console.txt" file).
> 3) I run the R script file (Gage_run.r) first to see that it loads all 
> the input files successfully and then i can move ahead with the tests. 
> The output is shown in "R Console.txt" file which shows the errors and 
> warnings.
> If you need more additional information. Please do tell me. I will be 
> happy to provide that.
> **an expression matrix with genes as rows and samples as columns.
> Thanks,
> Jung
> ------------------------------------------------------------------------
> *From:* Valerie Obenchain [vobencha at fhcrc.org]
> *Sent:* Tuesday, January 17, 2012 10:04 AM
> *To:* Javerjung Sandhu
> *Cc:* bioconductor at r-project.org; luo_weijun at yahoo.com
> *Subject:* Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE 
> Analysis and DO a BASIC GAGE analysis using those files
>
> Hello,
>
> I think the vignette is clear that you need (1) a gene set and (2) a 
> mircoarray dataset to run the gage analysis.  On page 4 they mention 
> the importance of having the same ID system for your gene set and 
> expression data. Once this is accomplished you can use the gage() 
> function.
>
> ## this is the expression data
> gse16873
>
> ## this is the gene set
> kegg.gs
>
> ## call to gage() using 'HN' as control and 'DCIS' as treatment
> gse16873.kegg.p <- gage(gse16873, gsets = kegg.gs,
>     ref = hn, samp = dcis)
>
>
> I belive if you have only one column of expression data the 'ref' and 
> 'samp' arguments should be omitted (i.e., default of NULL). Read ?gage 
> for details. Maybe the package author will comment on this. I've cc'd 
> them on this message.
>
> It is still not clear to me what you have tried. It would be helpful 
> to know the following,
>
> (1) what is your analysis question (what are you trying to accomplish)
> (2) what have you tried (what functions have you used)
> (3) what errors have you seen from #2
>
>
> Valerie
>
>
>
>
>
>
>
>
>
> On 01/16/2012 04:19 PM, Javerjung Sandhu wrote:
>> Hi Valerie,
>> First of all thanks a lot for replying and helping me. I really appreciate that. I am sending you the R source code file which the GAGE analysis uses plus two other documents which explains what that package does.
>> These are the data files used by the GAGE analysis:
>> ----------------------------
>> Data sets in package ‘gage’:
>> carta.gs               Common gene set data collections
>> egSymb                 Mapping between Entrez Gene IDs and official
>>                         symbols
>> go.gs                  Common gene set data collections
>> gse16873               GSE16873: a breast cancer microarray dataset
>> kegg.gs                Common gene set data collections
>> -----------------------------------------------------
>> I have only ONE tab delimited data file in the form of a MATRIX giving the gene expressions for 173 patients(as columns) and names of genes(as rows).
>> I want to know how can i use this package and my data to do the GAGE analysis.
>> If you need more information, please tell me. I will be ready to provide that.
>> Thanks,
>> Jung
>>
>> ________________________________________
>> From: Valerie Obenchain [vobencha at fhcrc.org]
>> Sent: Monday, January 16, 2012 3:18 PM
>> To: Javerjung Sandhu
>> Cc:bioconductor at r-project.org;luo_weijun at yahoo.com
>> Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE Analysis and DO a BASIC GAGE analysis using those files
>>
>> Hi Jung,
>>
>> Please provide the code you've tried and the error you are seeing. For
>> example, did you read your own data into R, then try to use gage() and
>> got an error? We can better help you if we understand your inputs and
>> the function you're having trouble with.
>>
>> Valerie
>>
>>
>> On 01/13/12 13:10, Javerjung Sandhu wrote:
>>> Dear List,
>>> I will highly appreciate your help on this.
>>> For the GAGE analysis package shown by the link given below:
>>> http://www.bioconductor.org/packages/release/bioc/html/gage.html
>>> Could you please tell me how to prepare the Custom INPUT files required for this analysis
>>> OR
>>> Send me the SAMPLE DATA files in TXT format so that i know in which format i need to put the data&   how could i DO a BASIC  GAGE analysis using those files. I couldn't figure it out and trying it since 3 weeks or more.
>>> Best Regards,
>>> Jung
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
>