[BioC] (no subject)
Robert Gentleman
rgentlem at fhcrc.org
Wed Apr 9 19:31:15 CEST 2008
Hi,
Paul Evans wrote:
> Hi Robert,
>
> Two questions.
>
> First, does that mean that I will be able to use the org.XX packages and
> KEGG if I download the GOstats package from the devel download page in
> bioconductor (instead of the release version)? Alternatively, is there
Yes, but you need to use the release candidate for R 2.7.0. In about two
weeks R 2.7.0 will be released, so you may want to wait, and shortly
after that BioC 2.2 will come out, and at that time all of this will
work, in the "new" release branches.
> any way I can get the hyperG test to function with a set of Entrez IDs
> only (for example, if I get data from SMD I will not have the chip
> details but only entrez ids).
Yes, of course, but then you are not using KEGG or GO or any of those
things for your gene sets, unless you do the mapping for them. You
should be able to use Entrez IDs from the SMD together with the
org.Sc.sgd.db, by simply restricting attention to those that are
contained in the org.Sc.sgd.db package.
>
> Second, I tried the same test with several affy and agilent arrays. For
> the code given below the 'hgug4110b' package returned the same error. I
> am reproducting the code below:
>
> ---------------------------------------------------------------------------
> ### TEST HYPERGTEST FOR AFFY/AGILENT CHIPS##
> rm(list = ls())
> library("hgug4110b")
> library("KEGG.db")
> library("GOstats")
>
> chips <- c("hgug4110b")
> pvalue <- 1
> for(i in 1:length(chips)){
> y <- get(paste(chips[i],"ENTREZID",sep=''))
> print(chips[i])
> xx <- as.list(y)
> # Remove probe identifiers that do not map to any ENTREZID
> xx <- xx[!is.na(xx)]
> if(length(xx) > 0){
> # The ENTREZIDs for the first two elements of XX
> xx[1:2]
> # Get the first one
> xx[[1]]
> }
> allGenes <- unique(unlist(xx))
> geneUniverse <- allGenes[1:7000]
> set.seed(37688)
> ## Create random cluster of 13 genes
> geneCluster <- sample(1:7000,13,replace=F)
> geneCluster <- unique(unlist(geneUniverse[geneCluster]))
> print(geneCluster)
> paramsGO <- new("GOHyperGParams", geneIds = geneCluster,
> universeGeneIds = geneUniverse, annotation = chips[i],
> ontology = "BP",
> pvalueCutoff = pvalue, conditional = FALSE, testDirection =
> "over")
>
> paramsKEGG <- new("KEGGHyperGParams", geneIds = geneCluster,
> universeGeneIds = geneUniverse, annotation = chips[i],
> pvalueCutoff = pvalue, testDirection = "over")
> #tryCatch(hgOverGO <- hyperGTest(paramsGO),error = function(e)
> {print('error GO')})
> tryCatch(hgOverKEGG <- hyperGTest(paramsKEGG),error = function(e)
> {print('error KEGG')})
> }
>
> -------------------------------------
> The output I get is:
>
> [1] "hgug4110b"
> [1] "4644" "55630" "BX647822" "9933" "79016" "5774"
> "7274" "6331" "51249" "55515" "AK096394" "28299" "AF116641"
> [1] "error KEGG"
>
> i.e. for this chip I get the same error ("Error in numW - numWdrawn :
> non-numeric argument to binary operator"). Am I doing something wrong?
No you are not doing anything wrong, there is a bug. You will need to
either wait for the next release (about 3 weeks), or use the devel
versions of everything.
best wishes
Robert
>
>
> regards.
>
>
>
>
>
> Hi Paul,
> Thanks for the report. Please, if you use sample also set a seed,
> otherwise your example is not reproducible.
>
> The short answer is that you cannot use KEGG with the org.XX packages
> in release. Based on your report I have modified the Category package
> (which is doing most of the work), so that this now should work in the
> devel branch, and that change should propagate in the next day or so to
> the web (version 2.5.9).
>
> best wishes
> Robert
>
>
> Paul Evans wrote:
>
>> > Thanks Robert. I tried the KEGG.db package and tried the
>> > KEGGHyperGParams again. The code I used is:
>> >
>> > -----------------------------------------------------------------------------
>> >
>> > ############ TEST hyperGTest for HOMO SAPIENS ######
>> > library("KEGG.db")
>> > library("GOstats")
>> > library("org.Hs.eg.db")
>> >
>> > x <- org.Hs.egACCNUM
>> > # Get the entrez gene identifiers that are mapped to an ACCNUM
>> > mapped_genes <- mappedkeys(x)
>> > geneUniverse <- mapped_genes[1:1200]
>> >
>> >
>> > ## Create random cluster of 13 genes
>> > geneCluster <- sample(1:1200,13,replace=F)
>> > geneCluster <- unique(unlist(geneUniverse[geneCluster]))
>> >
>> > print(geneCluster)
>> >
>> > paramsGO <- new("GOHyperGParams", geneIds = geneCluster,
>> > universeGeneIds = geneUniverse, annotation = "org.Hs.eg.db",
>> > ontology = "BP",
>> > pvalueCutoff = 1, conditional = FALSE, testDirection = "over")
>> >
>> >
>> > paramsKEGG <- new("KEGGHyperGParams", geneIds = geneCluster,
>> > universeGeneIds = geneUniverse, annotation = "org.Hs.eg.db",
>> > pvalueCutoff = 1, testDirection = "over")
>> >
>> >
>> > tryCatch(hgOverGO <- hyperGTest(paramsGO),error = function(e)
>> > {print('error GO')})
>> > tryCatch(hgOverKEGG <- hyperGTest(paramsKEGG),error = function(e)
>> > {print('error KEGG')})
>> >
>> > -----------------------------------------------------------------------------
>> >
>> >
>> >
>> > The output/error I got now is:
>> >
>> >
>> >
>> > [1] "901" "599" "435" "100" "1525" "25" "204" "1159" "865"
>> > "1195" "1629" "912" "998"
>> >
>> > Error in get(paste(lib, name, sep = "")) :
>> > no function to return from, jumping to top level
>> > [1] "error KEGG"
>> >
>> >
>> >
>> > My sessionInfo() is:
>> >
>> >
>> >
>> > > sessionInfo()
>> > R version 2.6.2 (2008-02-08)
>> > i386-pc-mingw32
>> >
>> > locale:
>> > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> > States.1252;LC_MONETARY=English_United
>> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>> >
>> > attached base packages:
>> > [1] splines tools stats graphics grDevices utils
>> > datasets methods base
>> >
>> > other attached packages:
>> > [1] org.Hs.eg.db_2.0.2 GOstats_2.4.0 Category_2.4.0
>> > genefilter_1.16.0 survival_2.34 RBGL_1.14.0
>> > annotate_1.16.1
>> > [8] xtable_1.5-2 GO.db_2.0.2 graph_1.16.1
>> > KEGG.db_2.0.2 AnnotationDbi_1.0.6 RSQLite_0.6-8
>> > DBI_0.2-4
>> > [15] Biobase_1.16.3
>> >
>> > loaded via a namespace (and not attached):
>> > [1] cluster_1.11.10
>> > >
>> >
>> >
>> >
>> > My apologies if I have missed something elementary!
>> >
>> >
>> >
>> > thanks!
>> >
>> >
>> >
>> >
>> >
>> > ----- Original Message ----
>> > From: Robert Gentleman <rgentlem at fhcrc.org>
>> > To: Paul Evans <p.evans48 at yahoo.com>
>> > Cc: Bioconductor at stat.math.ethz.ch
>> > Sent: Monday, March 31, 2008 3:45:11 PM
>> > Subject: Re: [BioC] GOstats - hyperGTest using "KEGGHyperGParams"
>> >
>> > Hi Paul,
>> > Thanks for the bug report, it seems that there is an issue when all
>> > values are zero, which shows up intermittently. You can solve it by
>> > using try or tryCatch around the call to hyperGTest. You can simply use
>> > a p-value of 1, which is what it will be.
>> >
>> > You should not be loading the GO package for this (KEGG if anything, and
>> > even then, please use KEGG.db, not KEGG).
>> >
>> > I will fix the bug, but given how close the release is I won't back
>> > port it, and it will only be available in the devel branch (soon to be
>> > the release branch),
>> >
>> > best wishes
>> > Robert
>> >
>> > Paul Evans wrote:
>> > > Hi all,
>> > >
>> > > I was trying to understand the hyperGTest for KEGG, and used the
>> > following code:
>> > >
>> > >
>> > -----------------------------------------------------------------------------------------------------------
>> > > ## TEST HYPERGTEST FOR KEGG
>> > >
>> > > library("YEAST")
>> > > library("GOstats")
>> > > library("GO")
>> > >
>> > > # Convert to a list
>> > > xx <- as.list(YEASTGENENAME)
>> > > # Remove probes that do not map to any GENENAME
>> > > xx <- xx[!is.na <http://is.na/>(xx)]
>> > > if(length(xx) > 0){
>> > > # Gets the gene names for the first five probe identifiers
>> > > xx[1:5]
>> > > # Get the first one
>> > > xx[[1]]
>> > > }
>> > >
>> > > ## Create gene universe
>> > > allGenes <- names(xx)
>> > > print(length(allGenes))
>> > > geneUniverse <- allGenes[1:800]
>> > > for(i in 1:20){
>> > > ## Create random cluster of 13 genes
>> > > geneCluster <- sample(1:800,13,replace=F)
>> > > geneCluster <- geneUniverse[geneCluster]
>> > > print(i)
>> > > print(geneCluster)
>> > > params <- new("KEGGHyperGParams", geneIds = geneCluster,
>> > > universeGeneIds = geneUniverse, annotation = "YEAST",
>> > > pvalueCutoff = 0.1, testDirection = "over")
>> > > hgOver <- hyperGTest(params)
>> > > dfrm <- summary(hgOver)
>> > > #print(dfrm)
>> > > }
>> > >
>> > >
>> > --------------------------------------------------------------------------------------------------------
>> > >
>> > > The output/error that I got is:
>> > >
>> > > [1] 1
>> > > [1] "YKR067W" "MOF9" "YDR518W" "YPR074C" "YCL011C" "YCR069W"
>> > "YDL104C" "YGR136W" "YAR003W" "YFR013W" "YOR116C" "YDR507C" "YGR167W"
>> > > [1] 2
>> > > [1] "YJR112W" "CEN8" "YPL005W" "YHR081W" "YLR323C" "YBR131W"
>> > "YLR347C" "YHR098C" "YOR107W" "YCL027W" "YNR012W" "CRL16" "YLR329W"
>> > > [1] 3
>> > > [1] "YNL327W" "YEL056W" "YNL321W" "YDL111C" "YMR284W" "YLR338W"
>> > "YPL008W" "CRL17" "YEL065W" "YFR027W" "YMR269W" "YPL019C" "YML038C"
>> > > Error in numW - numWdrawn : non-numeric argument to binary operator
>> > >
>> > >
>> > > [[elided trailing spam]]
>> > >
>> > > My sessionInfo():
>> > >
>> > >> sessionInfo()
>> > > R version 2.6.2 (2008-02-08)
>> > > i386-pc-mingw32
>> > > locale:
>> > > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> > States.1252;LC_MONETARY=English_United
>> > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>> > > attached base packages:
>> > > [1] splines tools stats graphics grDevices utils datasets
>> > methods base
>> > > other attached packages:
>> > > [1] KEGG_2.0.1 GOstats_2.4.0 Category_2.4.0
>> > genefilter_1.16.0 survival_2.34 RBGL_1.14.0 GO.db_2.0.2
>> > > [8] graph_1.16.1 goTools_1.10.0 annotate_1.16.1
>> > xtable_1.5-2 AnnotationDbi_1.0.6 RSQLite_0.6-8 DBI_0.2-4
>> >
>> > > [15] Biobase_1.16.3 GO_2.0.1 hu6800_2.0.1
>> > hgu95a_2.0.1 hgu95av2_2.0.1 hgu133plus2_2.0.1
>> > hgu133b_2.0.1
>> > > [22] hgu133a_2.0.1 som_0.3-4 YEAST_2.0.1
>> > cluster_1.11.10
>> > >
>> > >
>> > > thanks!
>> > >
>> > >
>> > >
>> > ____________________________________________________________________________________
>> > > Looking for last minute shopping deals?
>> > >
>> > > [[alternative HTML version deleted]]
>> > >
>> > > _______________________________________________
>> > > Bioconductor mailing list
>> > > Bioconductor at stat.math.ethz.ch <mailto:Bioconductor at stat.math.ethz.ch>
>> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > >
>> >
>> > --
>> > Robert Gentleman, PhD
>> > Program in Computational Biology
>> > Division of Public Health Sciences
>> > Fred Hutchinson Cancer Research Center
>> > 1100 Fairview Ave. N, M2-B876
>> > PO Box 19024
>> > Seattle, Washington 98109-1024
>> > 206-667-7700
>> > rgentlem at fhcrc.org <mailto:rgentlem at fhcrc.org>
>> >
>> >
>> > ------------------------------------------------------------------------
>> > You rock. That's why Blockbuster's offering you one month of Blockbuster
>> > Total Access
>> > <http://us.rd.yahoo.com/evt=47523/*http://tc.deals.yahoo.com/tc/blockbuster/text5.com
>> > >, No Cost.
>
> -- Robert Gentleman, PhD Program in Computational Biology Division of
> Public Health Sciences Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024
> 206-667-7700 rgentlem at fhcrc.org
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
--
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org
More information about the Bioconductor
mailing list