[BioC] GO-Analysis using TopGO for Yeast2 data

Adrian Alexa adrian.alexa at gmail.com
Tue Mar 25 16:53:45 CET 2008


Hi Tobias,

you are right about the issues with the topGO. The printGenes()
function won't work with the yeast2 package. Bellow is a simple
session in which a performed a dummy enrichment analysis for the using
the yeas2 annotations. You will find a copy of the printGenes()
function which will work with the yeast2 library. You can also add
more information into the table if you want with minimal code chnages.

I will try to document better this function (and many others) in the
next release of the Bioconductor. As well, there will be a more
comprehensive vignette. It will be hard to have a function with the
same name that will work on yeast2 and other Affy libraries, given the
structural difference. However I put it on my TODO list and I'll see
what I can come up with. Thanks for bringing this up.

Let me know if you have more questions,
Adrian

----------------------------------------------------------
library(topGO)

## get the library
affyLib <- "yeast2"
library(package = affyLib, character.only = TRUE)

geneNames <- ls(yeast2GO)

myInterestedGenes <- sample(geneNames, 500)
geneList <- factor(as.integer(geneNames %in% myInterestedGenes))
names(geneList) <- geneNames

## build the topGOdata class
GOyeast2 <- new("topGOdata", ontology = "BP", allGenes = geneList,
                description = "stability of GO - ALL dataset.",
                nodeSize = 5, annot = annFUN.hgu, affyLib = affyLib)

result <- runTest(GOyeast2, "classic", "fisher")

tableFis <- GenTable(GOyeast2, pValue = result, topNodes = 10, numChar = 40)

printGenesY <- function(object, whichTerms, affyLib, numChar = 100,
simplify = TRUE,
                        geneCufOff = 50, pvalCutOff) {

  term.genes <- genesInTerm(object, whichTerms)
  all.genes <- character()
  lapply(term.genes, function(x) all.genes <<- union(x, all.genes))

  ORF.lib <- get(paste(affyLib, "ORF", sep = ""))
  CHR.lib <- get(paste(affyLib, "CHR", sep = ""))
  GNAME.lib <- get(paste(affyLib, "GENENAME", sep = ""))

  probeMapping <- data.frame(ORF.id = unlist(mget(all.genes,
                               envir = ORF.lib, ifnotfound = NA)),
                             CHR = as.integer(unlist(mget(all.genes,
                               envir = CHR.lib, ifnotfound = NA))))

  retList <- vector("list", length(whichTerms))
  names(retList) <- whichTerms

  for(gt in whichTerms) {
    affID <- term.genes[[gt]]

    pval <- sort(geneScore(object, affID, use.names = TRUE))
    if(missing(pvalCutOff))
      affID <- names(pval)
    else {
      pval <- pval[pval <= pvalCutOff]
      affID <- names(pval)
    }

    ## we restrict the output to the number of genes
    length(affID) <- min(length(affID), geneCufOff)

    ## if there are no genes, there is nothing to print
    if(length(affID) == 0) {
      cat("\n\t No genes over the cutOff for:", gt, "\n")
      next
    }

    genesNames <- sapply(mget(affID, envir = GNAME.lib),
                         function(x) return(x[1]))
    genesNames <- paste(substr(genesNames, 1, numChar),
                        ifelse(nchar(genesNames) > numChar, "...",
""), sep = "")

    retList[[gt]] <- cbind("Affy ID" = affID, probeMapping[affID, ],
"Gene name" = genesNames,
                           "raw p-value" = format.pval(pval[affID],
dig = 3, eps = 1e-30))
  }

  ## if we have only one GO term we return just the data.frame
  if(simplify && length(whichTerms) == 1)
    return(retList[[1]])

  return(retList)
}




selGOterms <- tableFis$GO.ID
gg <- printGenesY(GOyeast2, whichTerms = selGOterms, affyLib =
affyLib, numChar = 50)


library(xtable)
index <- order(sapply(gg, nrow))
for(g in names(gg)[index]) {
  curTab <- apply(gg[[g]], 2, function(x) {return(gsub("_", ".",
as.character(x)))})
  cat("\\begin{table}[bth]\n \\centering\\resizebox{.99\\linewidth}{!}{%\n")
  print(xtable(curTab), floating = FALSE)
  cat("}\\caption{", g, "}\n\\end{table}\n\n")
}



-------------------------------------------------------------------------------------------------------------------------------------------















On Tue, Mar 25, 2008 at 8:32 AM, Tobias Koschubs
<biotekk2000 at compuserve.de> wrote:
> Hi!
>
>  I tried to do a GO-Analysis using the TopGO package with Affymetrix
>  Yeast2 Expression data but encountered a couple of problems. As I'm new
>  to Bioconductor, I would be very grateful to get some help.
>
>  The Yeast2-Package related problems are the following: Very recently,
>  the GO-Annotations have been updated and now include also the
>  annotations from computationally predicted methods (available from
>  yeastgenome.org and geneontology.org). Of course, I would like to use
>  the newest annotations and I tried to use the AnnBuilder
>  yeastPkgBuilder-method. However, I didn't succeed to bring the
>  gene_association.sgd file together with the Probe mappings on the chip.
>  I guess one would need to use the ORFs as identifiers instead of the
>  SGD-identifiers from the gene_association file. Moreover, I didn't
>  figure out how to set the path for the annotation file as local (it was
>  suggested to unpack it in advance). Is there someone from the Biocore
>  Data Team who has the proper scripts to do it?
>
>  With the TopGO package, I did the GO Analysis using the current yeast2
>  annotations. Unfortunately, it's documentation lacks some details and I
>  didn't figure out, how to print out the genes (or at least the probes)
>  according to the top-GO-terms listed by the genTable-Method. From the
>  source-code, I learned that there's a printGenes-method which possible
>  should do this. However, the yeast2 chip is a bit different from others
>  as it doesn't use ENTREZIDs or SYMBOLS and I also didn't figure out what
>  are the parameters of this method.
>
>  Best and thanks in advance,
>  Tobias Koschubs
>
>  _______________________________________________
>  Bioconductor mailing list
>  Bioconductor at stat.math.ethz.ch
>  https://stat.ethz.ch/mailman/listinfo/bioconductor
>  Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list