[BioC] topGO and Arabidopsis data
Adrian Alexa
adrian.alexa at gmail.com
Tue Dec 7 18:16:49 CET 2010
Hi Johannes,
I apologise for the late reply. I had a chance to look into the issue
you reported and apparently the problem is with the
org.At.tair.db_2.4.6. Namely the GO id reported in your error belongs
to the BPontology and to the MF ontology, at least based on
GO.db_2.4.5 and some other only resources. For example, if you run:
library(ath1121501.db)
goID <- "GO:0010241"
.sql <- "SELECT gene_id, go_id FROM genes INNER JOIN go_mf USING('_id')"
retVal <- dbGetQuery(org.At.tair_dbconn(), .sql)
go2prob <- split(retVal[["gene_id"]], retVal[["go_id"]])
go2prob[goID]
you will get:
$`GO:0010241`
[1] "AT5G25900"
But if you search this GO in the GO.db database you will find that it
belongs to the BP ontology.
GOTERM[[goID]
I hope this bug will be solved in the org.At.tair.db_2.4.6 package.
Until then I did a quick fix such that you can further use the topGO
package. The annFUN.db2 will keep only the GO terms available in the
GO.db specific for the chosen ontology:
annFUN.db2 <- function(whichOnto, feasibleGenes = NULL, affyLib) {
## we add the .db ending if needed
affyLib <- paste(sub(".db$", "", affyLib), ".db", sep = "")
require(affyLib, character.only = TRUE) || stop(paste("package",
affyLib, "is required", sep = " "))
affyLib <- sub(".db$", "", affyLib)
orgFile <- get(paste(get(paste(affyLib, "ORGPKG", sep = "")),
"_dbfile", sep = ""))
try(dbGetQuery(get(paste(affyLib, "dbconn", sep = "_"))(),
paste("ATTACH '", orgFile(), "' as org;", sep ="")),
silent = TRUE)
.sql <- paste("SELECT DISTINCT probe_id, go_id FROM probes INNER JOIN ",
"(SELECT * FROM org.genes INNER JOIN org.go_",
tolower(whichOnto)," USING('_id')) USING('gene_id');", sep = "")
retVal <- dbGetQuery(get(paste(affyLib, "dbconn", sep = "_"))(), .sql)
## restric to the set of feasibleGenes
if(!is.null(feasibleGenes))
retVal <- retVal[retVal[["probe_id"]] %in% feasibleGenes, ]
## split the table into a named list of GOs
retVal <- split(retVal[["probe_id"]], retVal[["go_id"]])
## return only the GOs mapped in GO.db
return(retVal[names(retVal) %in% ls(get(paste("GO",
toupper(whichOnto), "Term", sep = "")))])
}
Using this function you can build a topGOdata object as before (just
replace annFUN.db with annFUN.db2):
library(topGO)
allProbes <- ls(ath1121501GO)
## generate a random set of interesting pro
myInterestingGenes <- sample(allProbes, 100)
geneList <- factor(as.integer(allProbes %in% myInterestingGenes))
names(geneList) <- allProbes
## build a topGOdata object
sampleGOdata <- new("topGOdata", description = "Simple session",
ontology = "MF", allGenes = geneList,
nodeSize = 10, annot = annFUN.db2,
affyLib = "ath1121501.db")
sampleGOdata
This works without a problem on the latest stable R/Bioconductor version.
Let me know if you have further questions.
Best regards,
Adrian
> sessionInfo()
R version 2.12.1 beta (2010-12-06 r53802)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US
[4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=en_US
[7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ath1121501.db_2.4.5 org.At.tair.db_2.4.6 topGO_2.2.0
[4] SparseM_0.86 GO.db_2.4.5 RSQLite_0.9-4
[7] DBI_0.2-5 AnnotationDbi_1.12.0 Biobase_2.10.0
[10] graph_1.28.0
loaded via a namespace (and not attached):
[1] grid_2.12.0 lattice_0.19-13 tools_2.12.0
On Thu, Dec 2, 2010 at 5:44 PM, Johannes Hanson <s.j.hanson at uu.nl> wrote:
> Dear All,
>
> I am analyzing affymetrix expression data using the topGO package.
> I basically follow the script in the topGO vingette. It works fine for the CC and BP ontologies but for MF i get the following error:
>
>> sampleGOdata <- new("topGOdata", description = "Simple session", ontology = "MF", allGenes = geneList, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.db,affyLib = affyLib)
>
> Building most specific GOs ..... ( 1348 GO terms found. )
>
> Build GO DAG topology ..........
> There are no adj nodes for node: GO:0010241
> Error in switch(type, isa = 0, partof = 1, -1) :
> EXPR must be a length 1 vector
>
> I guess that there is something wrong in the annotation packages but honestly, I might well have misunderstood the error message. Using the examples in the topGO vingette and human annotation no errors are given.
>
> Thanks in advance,
> Johannes
>
>
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] topGO_2.2.0 SparseM_0.86 GO.db_2.4.5 graph_1.28.0 ath1121501.db_2.4.5 org.At.tair.db_2.4.6 RSQLite_0.9-3
> [8] DBI_0.2-5 AnnotationDbi_1.12.0 ath1121501cdf_2.7.0 affy_1.28.0 Biobase_2.10.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.18.0 grid_2.12.0 lattice_0.19-13 preprocessCore_1.12.0 tools_2.12.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list