[BioC] GOstats problem with output
Robert M. Flight
rflight79 at gmail.com
Fri Apr 8 14:27:02 CEST 2011
Hi Assa,
The reason you are getting no genes is that there are no genes
"directly" annotated to this term. I had the same error when I tried
to look up your GO term of interest using GO or GO2EG. you need to use
"org.Mm.egGO2ALLEGS" in this case to find the genes that are
indirectly annotated to this term via other terms. Also keep in mind
that Amigo is updated regularly, the Bioconductor packages are updated
every 6 months. This may lead to some discrepancy in the results from
Amigo and Bioconductor.
-Robert
On Fri, Apr 8, 2011 at 01:43, Assa Yeroslaviz <frymor at gmail.com> wrote:
> Well well,
> I am ashamed to say that it is now working.
>
> Apparently all I needed to do was to update the packages.
>
> I installed the new version of GO.db and GOstats
> and it is working now.
>
> Also I am still getting this error when trying to find which genes are
> attached to it.
>> mget('GO:2000021',org.Mm.egGO)
> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
> value for "GO:2000021" not found
>> mget('GO:2000021',org.Mm.egGO2EG)
> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
> value for "GO:2000021" not found
>
> So I guess the earlier error message as nothing to do with the fact that
> there are no genes from the mouse genome mapped to this GO category
>
> When I checked in AmiGo to see if there are no genes from mouse under this
> category, I found 83 genes.
> Can anyone tell me than what's the meaning of this error?
>
> Is there a way of manually update the GO data set, so that I can map these
> genes?
>
> Thanks
> Assa
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines grid stats graphics grDevices utils datasets
> [8] methods base
>
> other attached packages:
> [1] GSEABase_1.12.1 org.Mm.eg.db_2.4.6 biomaRt_2.6.0
> [4] Heatplus_1.20.0 ggplot2_0.8.9 proto_0.3-9.1
> [7] reshape_0.8.4 plyr_1.4 gplots_2.8.0
> [10] caTools_1.11 bitops_1.0-4.1 gdata_2.8.1
> [13] gtools_2.6.2 siggenes_1.24.0 multtest_2.7.1
> [16] Rgraphviz_1.29.0 xtable_1.5-6 annotate_1.28.1
> [19] GO.db_2.4.5 GOstats_2.16.0 RSQLite_0.9-4
> [22] DBI_0.2-5 graph_1.28.0 Category_2.16.0
> [25] AnnotationDbi_1.12.0 Biobase_2.10.0
>
> loaded via a namespace (and not attached):
> [1] genefilter_1.32.0 MASS_7.3-11 RBGL_1.26.0 RCurl_1.5-0
> [5] survival_2.36-5 tools_2.12.2 XML_3.2-0
>
> On Thu, Apr 7, 2011 at 18:49, Robert M. Flight <rflight79 at gmail.com> wrote:
>>
>> Hi Assa,
>>
>> As far as I am aware, if the GO term comes up in your list, then there
>> should be genes annotated to it. I did a simple test to verify that
>> the GO term does exist:
>>
>> crud <- as.list(GOTERM)
>> > crud$'GO:2000021'
>> GOID: GO:2000021
>> Term: regulation of ion homeostasis
>> Ontology: BP
>> Definition: Any process that modulates the frequency, rate or extent
>> of ion homeostasis.
>> Synonym: regulation of electrolyte homeostasis
>> Synonym: regulation of negative regulation of crystal biosynthesis
>> Synonym: regulation of negative regulation of crystal formation
>>
>> So far so good. Now lets look to see what genes are annotated to it:
>>
>> > library(org.Mm.eg.db)
>> > mget('GO:2000021',org.Mm.egGO)
>> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
>> value for "GO:2000021" not found
>>
>> > mget('GO:2000021',org.Mm.egGO2EG)
>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>> value for "GO:2000021" not found
>> > mget('GO:2000021',org.Mm.egGO2ALLEGS)
>> $`GO:2000021`
>> ISO ISO ISO ISO IGI IGI IMP
>> IGI ISO ISO IMP ISO ISO IDA
>> "11517" "11684" "11998" "12000" "12018" "12028" "12028"
>> "12043" "12061" "12257" "12291" "12349" "12372" "12389"
>> ISO ISO ISO ISO ISO IMP ISO
>> ISO IDA IMP IMP IGI IGI ISO
>> "12424" "12558" "13167" "13489" "13617" "13666" "14062"
>> "14126" "14225" "14225" "14226" "14629" "14630" "14652"
>> ISO IDA IDA ISO IDA ISO IC
>> ISO IMP IMP IDA IMP ISO ISO
>> "15171" "15978" "16818" "16867" "16963" "17096" "17131"
>> "18429" "18439" "18764" "19264" "20190" "21333" "21336"
>> ISO ISO IMP ISO ISO TAS IDA
>> ISO ISO ISO ISO ISO ISO ISO
>> "21803" "21808" "21819" "21838" "22041" "22784" "23832"
>> "24111" "26361" "50849" "54140" "76055" "76757" "108837"
>> ISO IMP ISO ISO IMP ISO
>> "217369" "225908" "233081" "238276" "259277" "317757"
>>
>> BTW, this was all using GO.db_2.4.5
>>
>> From this information, there are no genes that are directly annotated
>> to your GO term, only indirect annotations. I know this doesn't help
>> your current situation, but it points towards the problem at least. I
>> thought, however, when the summary was being prepared that it used the
>> GO2ALLEGS mapping, and not the direct one. Perhaps someone more
>> knowledgeable can figure out where in the code the error is likely to
>> be?
>>
>> -Robert
>>
>> Robert M. Flight, Ph.D.
>> University of Louisville Bioinformatics Laboratory
>> University of Louisville
>> Louisville, KY
>>
>> PH 502-852-1809 (HSC)
>> PH 502-852-0467 (Belknap)
>> EM robert.flight at louisville.edu
>> EM rflight79 at gmail.com
>>
>> Williams and Holland's Law:
>> If enough data is collected, anything may be proven by
>> statistical methods.
>>
>>
>>
>> On Thu, Apr 7, 2011 at 11:22, Assa Yeroslaviz <frymor at gmail.com> wrote:
>> > Hi,
>> >
>> > I am trying to run a HyerGTest with GOstats on a mouse genome entrez
>> > IDs.
>> >
>> > The Ids I have imported from biomart:
>> > entrez_data_1 <- getBM(attributes=c("mgi_id","entrezgene"), filters=
>> > "mgi_id", values = as.character(data_1$MGI),mart = mart)
>> > head(entrez_data_1)
>> > entrezID_Universe <-getBM(mart = mart, attributes = c("mgi_id",
>> > "entrezgene"), filters ="mgi_id", values =as.character(MaxQuant18$MGI))
>> > entrezID_Universe
>> > params <- new("GOHyperGParams", geneIds =
>> > as.character(entrez_data_1[,2]),
>> > universeGeneIds = as.character(entrezID_Universe[,2]), annotation =
>> > "org.Mm.eg.db", ontology = "BP", pvalueCutoff = 0.05, conditional =
>> > FALSE,
>> > testDirection = "over")
>> > I Than tried to run the HyperGTest command with success
>> > MmOverBP <- hyperGTest(paramsBP)
>> > MmOverBP
>> > Gene to GO BP test for over-representation
>> > 3146 GO BP ids tested (118 have p < 0.05)
>> > Selected gene set size: 1006
>> > Gene universe size: 2935
>> > Annotation package: org.Mm.eg
>> > but than:
>> > summary(MmOverBP)
>> >> summary(MmOverBP)
>> > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
>> > value for "GO:2000021" not found
>> >
>> > As far as I know, I have the latest version of both packages. I looked
>> > in
>> > AmiGO whether this GO Id exists: it does.
>> > AccessionGO:2000021OntologyBiological ProcessSynonymsrelated: regulation
>> > of
>> > electrolyte homeostasis related: regulation of negative regulation of
>> > crystal biosynthesisrelated: regulation of negative regulation of
>> > crystal
>> > formation Is there a way of putting/annotating this specific item
>> > manually,
>> > so that I can see it?
>> > If not-
>> > Is there a way of extracting this GO ID from the list of GO categories,
>> > so
>> > that I can see the results?
>> >
>> > Thanks a lot
>> > Assa
>> >
>> >
>> >> sessionInfo()
>> > R version 2.12.2 (2011-02-25)
>> > Platform: x86_64-pc-linux-gnu (64-bit)
>> >
>> > locale:
>> > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> > [9] LC_ADDRESS=C LC_TELEPHONE=C
>> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> >
>> > attached base packages:
>> > [1] splines grid stats graphics grDevices utils datasets
>> > [8] methods base
>> >
>> > other attached packages:
>> > [1] GO.db_2.4.1 org.Mm.eg.db_2.4.6 biomaRt_2.6.0
>> > [4] Heatplus_1.20.0 gplots_2.8.0 caTools_1.11
>> > [7] bitops_1.0-4.1 gdata_2.8.1 gtools_2.6.2
>> > [10] siggenes_1.24.0 multtest_2.7.1 Rgraphviz_1.29.0
>> > [13] xtable_1.5-6 annotate_1.28.1 GOstats_2.16.0
>> > [16] RSQLite_0.9-4 DBI_0.2-5 graph_1.28.0
>> > [19] Category_2.16.0 AnnotationDbi_1.12.0 Biobase_2.10.0
>> >
>> > loaded via a namespace (and not attached):
>> > [1] genefilter_1.32.0 GSEABase_1.12.1 MASS_7.3-11 RBGL_1.26.0
>> > [5] RCurl_1.5-0 survival_2.36-5 tcltk_2.12.2 tools_2.12.2
>> > [9] XML_3.2-0
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>
>
More information about the Bioconductor
mailing list