[BioC] GO-Analysis using TopGO for Yeast2 data
Adrian Alexa
adrian.alexa at gmail.com
Tue Mar 25 16:53:45 CET 2008
Hi Tobias,
you are right about the issues with the topGO. The printGenes()
function won't work with the yeast2 package. Bellow is a simple
session in which a performed a dummy enrichment analysis for the using
the yeas2 annotations. You will find a copy of the printGenes()
function which will work with the yeast2 library. You can also add
more information into the table if you want with minimal code chnages.
I will try to document better this function (and many others) in the
next release of the Bioconductor. As well, there will be a more
comprehensive vignette. It will be hard to have a function with the
same name that will work on yeast2 and other Affy libraries, given the
structural difference. However I put it on my TODO list and I'll see
what I can come up with. Thanks for bringing this up.
Let me know if you have more questions,
Adrian
----------------------------------------------------------
library(topGO)
## get the library
affyLib <- "yeast2"
library(package = affyLib, character.only = TRUE)
geneNames <- ls(yeast2GO)
myInterestedGenes <- sample(geneNames, 500)
geneList <- factor(as.integer(geneNames %in% myInterestedGenes))
names(geneList) <- geneNames
## build the topGOdata class
GOyeast2 <- new("topGOdata", ontology = "BP", allGenes = geneList,
description = "stability of GO - ALL dataset.",
nodeSize = 5, annot = annFUN.hgu, affyLib = affyLib)
result <- runTest(GOyeast2, "classic", "fisher")
tableFis <- GenTable(GOyeast2, pValue = result, topNodes = 10, numChar = 40)
printGenesY <- function(object, whichTerms, affyLib, numChar = 100,
simplify = TRUE,
geneCufOff = 50, pvalCutOff) {
term.genes <- genesInTerm(object, whichTerms)
all.genes <- character()
lapply(term.genes, function(x) all.genes <<- union(x, all.genes))
ORF.lib <- get(paste(affyLib, "ORF", sep = ""))
CHR.lib <- get(paste(affyLib, "CHR", sep = ""))
GNAME.lib <- get(paste(affyLib, "GENENAME", sep = ""))
probeMapping <- data.frame(ORF.id = unlist(mget(all.genes,
envir = ORF.lib, ifnotfound = NA)),
CHR = as.integer(unlist(mget(all.genes,
envir = CHR.lib, ifnotfound = NA))))
retList <- vector("list", length(whichTerms))
names(retList) <- whichTerms
for(gt in whichTerms) {
affID <- term.genes[[gt]]
pval <- sort(geneScore(object, affID, use.names = TRUE))
if(missing(pvalCutOff))
affID <- names(pval)
else {
pval <- pval[pval <= pvalCutOff]
affID <- names(pval)
}
## we restrict the output to the number of genes
length(affID) <- min(length(affID), geneCufOff)
## if there are no genes, there is nothing to print
if(length(affID) == 0) {
cat("\n\t No genes over the cutOff for:", gt, "\n")
next
}
genesNames <- sapply(mget(affID, envir = GNAME.lib),
function(x) return(x[1]))
genesNames <- paste(substr(genesNames, 1, numChar),
ifelse(nchar(genesNames) > numChar, "...",
""), sep = "")
retList[[gt]] <- cbind("Affy ID" = affID, probeMapping[affID, ],
"Gene name" = genesNames,
"raw p-value" = format.pval(pval[affID],
dig = 3, eps = 1e-30))
}
## if we have only one GO term we return just the data.frame
if(simplify && length(whichTerms) == 1)
return(retList[[1]])
return(retList)
}
selGOterms <- tableFis$GO.ID
gg <- printGenesY(GOyeast2, whichTerms = selGOterms, affyLib =
affyLib, numChar = 50)
library(xtable)
index <- order(sapply(gg, nrow))
for(g in names(gg)[index]) {
curTab <- apply(gg[[g]], 2, function(x) {return(gsub("_", ".",
as.character(x)))})
cat("\\begin{table}[bth]\n \\centering\\resizebox{.99\\linewidth}{!}{%\n")
print(xtable(curTab), floating = FALSE)
cat("}\\caption{", g, "}\n\\end{table}\n\n")
}
-------------------------------------------------------------------------------------------------------------------------------------------
On Tue, Mar 25, 2008 at 8:32 AM, Tobias Koschubs
<biotekk2000 at compuserve.de> wrote:
> Hi!
>
> I tried to do a GO-Analysis using the TopGO package with Affymetrix
> Yeast2 Expression data but encountered a couple of problems. As I'm new
> to Bioconductor, I would be very grateful to get some help.
>
> The Yeast2-Package related problems are the following: Very recently,
> the GO-Annotations have been updated and now include also the
> annotations from computationally predicted methods (available from
> yeastgenome.org and geneontology.org). Of course, I would like to use
> the newest annotations and I tried to use the AnnBuilder
> yeastPkgBuilder-method. However, I didn't succeed to bring the
> gene_association.sgd file together with the Probe mappings on the chip.
> I guess one would need to use the ORFs as identifiers instead of the
> SGD-identifiers from the gene_association file. Moreover, I didn't
> figure out how to set the path for the annotation file as local (it was
> suggested to unpack it in advance). Is there someone from the Biocore
> Data Team who has the proper scripts to do it?
>
> With the TopGO package, I did the GO Analysis using the current yeast2
> annotations. Unfortunately, it's documentation lacks some details and I
> didn't figure out, how to print out the genes (or at least the probes)
> according to the top-GO-terms listed by the genTable-Method. From the
> source-code, I learned that there's a printGenes-method which possible
> should do this. However, the yeast2 chip is a bit different from others
> as it doesn't use ENTREZIDs or SYMBOLS and I also didn't figure out what
> are the parameters of this method.
>
> Best and thanks in advance,
> Tobias Koschubs
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list