[BioC] GOstats minus IEA?
Loren Engrav
engrav at u.washington.edu
Tue Apr 27 18:15:28 CEST 2010
Thank you
I gave GOstats the entrez IDs directly and that solved the NA problem
Somehow extracting them from hgu133plus2ENTREZID was problematic with funny
NAs
Then used your code to fish out non-IEA
So it works, thank you
From: Vincent Carey <stvjc at channing.harvard.edu>
Date: Mon, 26 Apr 2010 21:55:12 -0400
To: Loren Engrav <engrav at u.washington.edu>
Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] GOstats minus IEA?
On Mon, Apr 26, 2010 at 9:26 PM, Loren Engrav <engrav at u.washington.edu>
wrote:
> Thank you, looks clever
> Am working thru it but am stuck
>
>> GOstatsentrezUniverse <- unlist(mget(featureNames(GOstats1842v2exprs),
> hgu133plus2ENTREZID))
>> GOstatsentrezSelected <- unlist(mget(featureNames(GOstats153v2exprs),
> hgu133plus2ENTREZID))
>> GOstats_params_BP.001over <- new("GOHyperGParams", geneIds =
> GOstatsentrezSelected, universeGeneIds = GOstatsentrezUniverse, annotation =
> "hgu133plus2.db", ontology = "BP", pvalueCutoff = .001, conditional = FALSE,
> testDirection = "over")
> Warning messages:
> 1: In makeValidParams(.Object) : removing duplicate IDs in geneIds
> 2: In makeValidParams(.Object) : removing duplicate IDs in universeGeneIds
>> ids <- GOstats_params_BP.001over at geneIds
>> gids = mget(ids, org.Hs.egGO)
> Error in .checkKeysAreWellFormed(keys) :
> keys must be supplied in a character vector with no NAs
>
try any(is.na <http://is.na> (ids)) -- if this is TRUE you will need to do
something like
mget(na.omit(ids), ...)
if it is not TRUE then you will have to send some exemplars from gids for
diagnosis
> How do I unstick gids?
>
> =========================================
>
>
>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] grid tools stats graphics grDevices utils datasets
> methods base
>
> other attached packages:
> [1] codetools_0.2-2 genefilter_1.30.0 RColorBrewer_1.0-2
> xtable_1.5-6 Rgraphviz_1.26.0
> [6] GO.db_2.4.1 hgu133plus2.db_2.4.1 org.Hs.eg.db_2.4.1
> annotate_1.26.0 GOstats_2.14.0
> [11] RSQLite_0.8-4 DBI_0.2-5 graph_1.26.0
> Category_2.14.0 AnnotationDbi_1.10.0
> [16] Biobase_2.8.0
>
> loaded via a namespace (and not attached):
> [1] GSEABase_1.10.0 RBGL_1.24.0 splines_2.11.0 survival_2.35-8
> XML_2.8-1
>
>
>
> From: Vincent Carey <stvjc at channing.harvard.edu>
> Date: Mon, 26 Apr 2010 11:52:00 -0400
> To: Loren Engrav <engrav at u.washington.edu>
> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: Re: [BioC] GOstats minus IEA?
>
> There does not seem to be a direct way within the GOstats tools to perform
> this kind of filtering. However, a help.search("evidence") can find a
> function called dropECode that addresses this concern if you have the
> annotate package installed.
>
> You would need to use it as you define your gene list and universe to
> exclude genes that have undesirable evidence profiles. For example, if you
> run the vignette GOstatsHyperG.Rnw, an object called params will be
> created. This includes examples of geneIds and universe vectors that are in
> fact entrez gene IDs.
>
> Briefly, to see how dropECode can be used, consider
>
>> Sweave("GOstatsHyperG.Rnw")
>> ids = params at geneIds
>> gids = mget(ids, org.Hs.egGO)
>> dgids = lapply(gids, dropECode)
>> table(sapply(gids,length))
>
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
> 26
> 12 18 28 17 33 51 59 63 56 44 39 34 24 22 25 19 13 7 7 12 7 6 9 6 5
> 2
> 27 28 29 30 31 32 33 34 35 36 37 39 40 41 42 43 44 45 50 54 64 65 72 79
> 2 1 3 5 3 1 5 2 1 2 1 1 1 4 1 1 1 2 1 1 1 1 1 1
>
>> table(sapply(dgids,length))
> 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 26
> 27
> 91 58 77 85 89 53 41 29 20 25 12 15 7 10 1 4 8 4 2 4 3 1 1 1 6
> 2
> 30 32 35 36 40 47
> 2 3 1 3 2 1
>
> This shows that prior to dropECode (which by default drops terms annotated
> via IEA) there were 12 genes with a single association; subsequent to
> dropECode, 91 genes had none and 58 had only one. Further exploration
> indicates that gene 10265 is one that has 11 associations, all of them coded
> IEA.
>
>> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-04-16 r51754)
> x86_64-apple-darwin10.3.0
>
> locale:
> [1] C
>
> attached base packages:
> [1] grid stats graphics grDevices datasets tools utils
> [8] methods base
>
> other attached packages:
> [1] Rgraphviz_1.27.0 xtable_1.5-5 RColorBrewer_1.0-2
> [4] GOstats_2.13.0 graph_1.25.1 Category_2.13.3
> [7] genefilter_1.29.2 annotate_1.25.0 GO.db_2.4.0
> [10] hgu95av2.db_2.4.0 org.Hs.eg.db_2.4.0 RSQLite_0.8-4
> [13] DBI_0.2-5 AnnotationDbi_1.9.8 ALL_1.4.7
> [16] Biobase_2.7.6 weaver_1.13.0 codetools_0.2-2
> [19] digest_0.4.1
>
> loaded via a namespace (and not attached):
> [1] GSEABase_1.9.0 RBGL_1.23.0 XML_2.6-0 splines_2.12.0
> [5] survival_2.35-8
>
>
> On Mon, Apr 26, 2010 at 10:54 AM, Loren Engrav <engrav at u.washington.edu>
> wrote:
>> GO.db and org.Hs.egGO2EG and manipulating content thereof was discussed a
>> month or so ago and is kind of complicated.
>>
>> Is it possible to run GOstats and exclude IEA evidence without serious
>> custom work?
>>
>> I searched gmane.science.biology.informatics.conductor and the 4 GOstats
>> pdfs and did not hit upon anything.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list