[BioC] FW: URGENT Help required: Getting this error with GAGE analysis, input files attached
Martin Morgan
mtmorgan at fhcrc.org
Thu Jan 26 22:22:39 CET 2012
On 01/26/2012 12:41 PM, Javerjung Sandhu wrote:
> Hi Martin,
> Thanks for the reply. I am forwarding this message to you which shows
> the error in RED. I have checked the help pages for GAGE, readExpData
> which don't give that much info. Class of "Micro_array_data" is a
> data.frame. I will also forward you the email of Mr. Luo Weijun which
The help page ?gage says that class of the first argument should be a
'matrix'. The error message also says that the function was expecting a
'matrix'. As you have discovered you provided a 'data.frame'. Is a
data.frame a matrix?
Martin
> might help you i assume. In that email Mr. Luo Weijun explains what
> should be the format of input files and how should i read them. I am
> reading the files in the same way but still it shows the error.
> Thanks,
> Jung
> ------------------------------------------------------------------------
> *From:* Javerjung Sandhu
> *Sent:* Tuesday, January 24, 2012 11:16 AM
> *To:* luo_weijun at yahoo.com
> *Cc:* bioconductor at r-project.org
> *Subject:* URGENT Help required: Getting this error with GAGE analysis,
> input files attached.
>
>
> ------------------------------------------------------------------------
> *From:* Javerjung Sandhu
> *Sent:* Monday, January 23, 2012 1:27 PM
> *To:* Valerie Obenchain
> *Cc:* bioconductor at r-project.org; luo_weijun at yahoo.com
> *Subject:* Getting this error with GAGE analysis, input files attached
>
> Hi there,
> I am getting this error on R console. I have attached the input files.
> Help will be really appreciated.
>
> > Micro_array_data <- readExpData(file = "Micro_array_dataset.txt")
> > Gene_set <- readList("Gene_set.gmt")
> > Reference_condition <- c(1,3,5)
> > Target_condition <- c(2,4,6)
> > A1_compare_un <- gage(Micro_array_data, Gene_set, ref =
> Reference_condition, samp = Target_condition)
> Error in saaPrep(exprs, ref = ref, samp = samp, same.dir = same.dir,
> compare = compare, :
> exprs needs to be a numeric matrix or vector
> > # Essential_member_genes <- essGene(Gene_set, Micro_array_data,ref =
> NULL)
> > # Non_redundant_significant_gene_set_list <- esset.grp()
> >
>
> Thanks,
>
> Jung
>
> ________________________________________
> From: Valerie Obenchain [vobencha at fhcrc.org]
> Sent: Monday, January 23, 2012 9:50 AM
> To: Javerjung Sandhu
> Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE
> Analysis and DO a BASIC GAGE analysis using those files
>
> Hello,
>
> On 01/22/12 16:31, Javerjung Sandhu wrote:
> > Hi Valerie,
> > Thanks for the information. Now i won't follow the vignette, i am
> trying to write my own code from scratch.
> > My supervisor said i should follow the GO.GS gene dataset for now and
> we can work with others later. Actually i am an engineering science
> student who had no background in biology and i got a co-op job at bc
> cancer agency to do some analysis using perl and python. But last month
> my supervisor said that i need to work on GAGE therefore i learned the R
> from different sources and also from the R website, i got the "R intro"
> file which helped me a lot to learn R.
> > So i have a request for you. I will send you the code which i will
> write along with the data files. So if you could please help me in
> getting rid of the errors so that i can finish the analysis asap.
> If you have problems using the gage package, they should be posted on
> the bioconductor mailing list. As you mentioned, Weijun has also
> responded to your message and is willing to help. Posting on the list
> makes it possible for more than one person to respond and for other new
> users to learn from the discussion. So, once you have your script
> written and have tried to use the functions in gage, post them to the
> mailing list. You need to provide a small working example of what you
> have tried and what errors you are seeing.
>
> > I really appreciate all your help.
> > I also recieved an email from Weijun. I will go through that email
> and ask you questions/problems.
> > If possible can you write a script for me which can do a basic GAGE
> analysis and i can edit that to customise it according to my needs. You
> can use the GO.GS gene set and the input file which i have attached
> right now.
> No, unfortunately I can't write the script for you. The vignette in the
> package has examples of how to perform a gage analysis. Your data will
> be different but the general steps will be the same. If you run into
> trouble, post a small, reproducible example on the mailing list.
>
> Valerie
>
> > But i am also writing my code but i am so depressed, sad and
> confused; i don't think my code will work.
> > Thanks,
> > Jung
> >
> > ________________________________________
> > From: Valerie Obenchain [vobencha at fhcrc.org]
> > Sent: Thursday, January 19, 2012 5:54 PM
> > To: Javerjung Sandhu
> > Cc: bioconductor at r-project.org; luo_weijun at yahoo.com
> > Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE
> Analysis and DO a BASIC GAGE analysis using those files
> >
> > Hi Jung,
> >
> > Thank you for sending your files but there is no need to attach the
> > source files from the gage package (GAGE.r, gage.pdf). I have access to
> > those files.
> >
> > The package vignette is just intended to be an example. Clearly the data
> > in the package and your data will be very different. It does not make
> > sense to try to follow the code exactly "as is" when using your data.
> > For example, it doesn't make sense for you to grep for 'HN', 'ADH' and
> > 'DCIS' since they don't exist in your file. These are treatment groups
> > included in the gage sample data and have no bearing on your analysis.
> > This is why you see nothing (i.e., integer(0)) for these variables.
> >
> > > Micro_array_dataset<- read.table("Micro_array_dataset.txt")
> > > cn=colnames(Micro_array_dataset)
> > > hn=grep('HN',cn, ignore.case =T)
> > > adh=grep('ADH',cn, ignore.case =T)
> > > dcis=grep('DCIS',cn, ignore.case =T)
> > > print(hn)
> > integer(0)
> > > print(dcis)
> > integer(0)
> >
> >
> > This error is due to the fact that you are subsetting a data.frame and
> > have not specified the columns. In the vignette, the gene set is a list
> > so this subsetting works.
> >
> > > lapply(Gene_set[1:3],head)
> > Error in `[.data.frame`(Gene_set, 1:3) : undefined columns selected
> >
> >
> > Next, your genes need to be grouped by pathway. The idea is to do an
> > analysis of gene pathways so you need to provide a list of genes grouped
> > by pathway (like the kegg.gs or go.gs example files in the vignette).
> > Your gene file consists only of gene names,
> >
> > > head(rownames(Micro_array_dataset))
> > [1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419"
> "ENSG00000000457"
> > [5] "ENSG00000000460" "ENSG00000000938"
> >
> > In R, a list of genes grouped by pathway would look like something like
> > this,
> > > head(kegg.gs)
> > $`hsa00010 Glycolysis / Gluconeogenesis`
> > [1] "10327" "124" "125" "126" "127" "128" "130"
> > "130589"
> > [9] "131" "160287" "1737" "1738" "2023" "2026" "2027" "217"
> > ...
> >
> > $`hsa00020 Citrate cycle (TCA cycle)`
> > [1] "1431" "1737" "1738" "1743" "2271" "283398" "3417" "3418"
> > [9] "3419" "3420" "3421" "4190" "4191" "47" "48" "4967"
> > ...
> >
> > You need to identify what pathways you are interested and group the
> > genes by those pathways. For identifying pathways take a look at the
> > GO.db, KEGG.db or reactome.db. Mapping between gene identifiers can be
> > done with the org.*.db packages.
> >
> > http://www.bioconductor.org/packages/release/data/annotation/
> >
> > Some general background on using Bioconductor annotation data is here,
> >
> >
> >
> http://www.bioconductor.org/help/workflows/annotation-data/#annotation-resources
> >
> >
> > Valerie
> >
> >
> > On 01/17/12 12:51, Javerjung Sandhu wrote:
> >> Hello Valerie,
> >> Thanks for your help. I am sending you the data
> >> files(Micro_array_dataset.txt**& Gene_Set.txt) which i want to use
> >> for the analysis.
> >> I need to know in which format the files should be saved (like
> >>
> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats
> >> this site explains in great detail, what should be the format of the
> >> data files required for GSEA analysis (though i am not using GSEA
> >> analysis or these file types), same way i want to know in which format
> >> i should save the data files required for GAGE analysis so that the
> >> analysis is done properly)
> >> Please tell me which information is missing from these files.
> >> * Yes i know that "gse16873" is expression data and "kegg.gs" is a
> >> geneset but i want to use my own, these ones are provided by the author.
> >> 1) What i want to accomplish is: I want to do a basic gage analysis
> >> (as given in the R script file named "GAGE.r" and pdf file "gage.pdf")
> >> such as t-test, rank test, KS test etc.
> >> 2) I copied the begining code(to make sure that it loads all the files
> >> successfully) from R script file provided by the author (which is also
> >> attached as GAGE.r) and made some changes to it and saved as my own
> >> script (also attached as Gage_run.r). I tried to load the data files
> >> (Micro_array_dataset.txt& Gene_Set.txt) and got these errors (shown
> >> in "R Console.txt" file).
> >> 3) I run the R script file (Gage_run.r) first to see that it loads all
> >> the input files successfully and then i can move ahead with the tests.
> >> The output is shown in "R Console.txt" file which shows the errors and
> >> warnings.
> >> If you need more additional information. Please do tell me. I will be
> >> happy to provide that.
> >> **an expression matrix with genes as rows and samples as columns.
> >> Thanks,
> >> Jung
> >> ------------------------------------------------------------------------
> >> *From:* Valerie Obenchain [vobencha at fhcrc.org]
> >> *Sent:* Tuesday, January 17, 2012 10:04 AM
> >> *To:* Javerjung Sandhu
> >> *Cc:* bioconductor at r-project.org; luo_weijun at yahoo.com
> >> *Subject:* Re: [BioC] How to prepare Custom INPUT(DATA) files for GAGE
> >> Analysis and DO a BASIC GAGE analysis using those files
> >>
> >> Hello,
> >>
> >> I think the vignette is clear that you need (1) a gene set and (2) a
> >> mircoarray dataset to run the gage analysis. On page 4 they mention
> >> the importance of having the same ID system for your gene set and
> >> expression data. Once this is accomplished you can use the gage()
> >> function.
> >>
> >> ## this is the expression data
> >> gse16873
> >>
> >> ## this is the gene set
> >> kegg.gs
> >>
> >> ## call to gage() using 'HN' as control and 'DCIS' as treatment
> >> gse16873.kegg.p<- gage(gse16873, gsets = kegg.gs,
> >> ref = hn, samp = dcis)
> >>
> >>
> >> I belive if you have only one column of expression data the 'ref' and
> >> 'samp' arguments should be omitted (i.e., default of NULL). Read ?gage
> >> for details. Maybe the package author will comment on this. I've cc'd
> >> them on this message.
> >>
> >> It is still not clear to me what you have tried. It would be helpful
> >> to know the following,
> >>
> >> (1) what is your analysis question (what are you trying to accomplish)
> >> (2) what have you tried (what functions have you used)
> >> (3) what errors have you seen from #2
> >>
> >>
> >> Valerie
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 01/16/2012 04:19 PM, Javerjung Sandhu wrote:
> >>> Hi Valerie,
> >>> First of all thanks a lot for replying and helping me. I really
> appreciate that. I am sending you the R source code file which the GAGE
> analysis uses plus two other documents which explains what that package
> does.
> >>> These are the data files used by the GAGE analysis:
> >>> ----------------------------
> >>> Data sets in package ‘gage’:
> >>> carta.gs Common gene set data collections
> >>> egSymb Mapping between Entrez Gene IDs and official
> >>> symbols
> >>> go.gs Common gene set data collections
> >>> gse16873 GSE16873: a breast cancer microarray dataset
> >>> kegg.gs Common gene set data collections
> >>> -----------------------------------------------------
> >>> I have only ONE tab delimited data file in the form of a MATRIX
> giving the gene expressions for 173 patients(as columns) and names of
> genes(as rows).
> >>> I want to know how can i use this package and my data to do the
> GAGE analysis.
> >>> If you need more information, please tell me. I will be ready to
> provide that.
> >>> Thanks,
> >>> Jung
> >>>
> >>> ________________________________________
> >>> From: Valerie Obenchain [vobencha at fhcrc.org]
> >>> Sent: Monday, January 16, 2012 3:18 PM
> >>> To: Javerjung Sandhu
> >>> Cc:bioconductor at r-project.org;luo_weijun at yahoo.com
> >>> Subject: Re: [BioC] How to prepare Custom INPUT(DATA) files for
> GAGE Analysis and DO a BASIC GAGE analysis using those files
> >>>
> >>> Hi Jung,
> >>>
> >>> Please provide the code you've tried and the error you are seeing. For
> >>> example, did you read your own data into R, then try to use gage() and
> >>> got an error? We can better help you if we understand your inputs and
> >>> the function you're having trouble with.
> >>>
> >>> Valerie
> >>>
> >>>
> >>> On 01/13/12 13:10, Javerjung Sandhu wrote:
> >>>> Dear List,
> >>>> I will highly appreciate your help on this.
> >>>> For the GAGE analysis package shown by the link given below:
> >>>> http://www.bioconductor.org/packages/release/bioc/html/gage.html
> >>>> Could you please tell me how to prepare the Custom INPUT files
> required for this analysis
> >>>> OR
> >>>> Send me the SAMPLE DATA files in TXT format so that i know in
> which format i need to put the data& how could i DO a BASIC GAGE
> analysis using those files. I couldn't figure it out and trying it since
> 3 weeks or more.
> >>>> Best Regards,
> >>>> Jung
> >>>>
> >>>> [[alternative HTML version deleted]]
> >>>>
> >>>> _______________________________________________
> >>>> Bioconductor mailing list
> >>>> Bioconductor at r-project.org
> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>> Search the
> archives:http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
>
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list