[BioC] AnnBuilder with custmerized GO annotation

Tue May 1 19:04:25 CEST 2007

Thanks. I thought about it, but I did not find the 'E' in this example
from GO website:

http://www.geneontology.org/GO.evidence.shtml

Another question is for GOstats: is the evidence code involved in the
enrichment analysis?

=======================
Here I repeat part of the annotation that I got:

      ENTREZID PROBE        ACCNUM          UNIGENE   
 [1,] "10"     "36512_at"   "L32179"        "Hs.2"    
 [2,] "10"     "38912_at"   "D90042"        "Hs.2"    
 [3,] "1084"   "32468_f_at" "D90278;M16652" "NA"      
 [4,] "125"    "35730_at"   "X03350"        "NA"      
 [5,] "2"      "32469_at"   "L00693"        "Hs.74561"
 [6,] "63036"  "38936_at"   "M16652"        "NA"      
 [7,] "7051"   "32481_at"   "AL031663"      "NA"      
 [8,] "9"      "33825_at"   "X68733"        "NA"      
 [9,] "NA"     "39368_at"   "AL031668"      "NA"  

      GO                                          OMIM
 [1,] "GO:0004060 at E"                              "NA"
 [2,] "GO:0004060 at E"                              "NA"
 [3,] "NA"                                        "NA"
 [4,] "NA"                                        "NA"
 [5,] "GO:0008320 at NR;GO:0004866 at NR;GO:0006886 at NR" "NA"
 [6,] "NA"                                        "NA"
 [7,] "NA"                                        "NA"
 [8,] "GO:0004060 at E"                              "NA"
 [9,] "NA"                                        "NA"

=======================
Here is the script that I used while following the example:

library(AnnBuilder);
pkgpath <- .find.package("AnnBuilder");

# test dataset
pkgdir <- "/nethome/xpeng/linux/analysis/array/scripts/pkgs";
setwd(pkgdir);
geneNMap <- matrix(c("32468_f_at", "D90278;M16652", "32469_at",
"L00693",
"32481_at", "AL031663", "33825_at", "X68733",
"35730_at", "X03350", "36512_at", "L32179",
"38912_at", "D90042", "38936_at", "M16652",
"39368_at", "AL031668"), ncol = 2, byrow = TRUE)

write.table(geneNMap, file = "geneNMap", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE)

# get annotation info.
makeSrcInfo()
srcObjs <- list()
egUrl <- "http://www.bioconductor.org/datafiles/wwwsources"
ugUrl <- "http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz"
eg <- EG(srcUrl = egUrl, parser = file.path(pkgpath, "scripts",
    "gbLLParser"), baseFile = "geneNMap", accession = "Tll_tmpl.gz",
    built = "N/A", fromWeb = TRUE)
ug <- UG(srcUrl = ugUrl, parser = file.path(pkgpath,
    "scripts", "gbUGParser"), baseFile = "geneNMap",
    organism = "Homo sapiens", built = "N/A", fromWeb = TRUE)
srcObjs[["eg"]] <- eg
srcObjs[["ug"]] <- ug

if(.Platform$OS.type != "windows"){
    llMapping <- parseData(eg, eg at accession)
    colnames(llMapping) <- c("PROBE", "EG")
    ugMapping <- parseData(ug)
    colnames(ugMapping) <- c("PROBE", "UG")
}

# This portion only runs after the previous code has been
# executed under windows
if(.Platform$OS.type != "windows"){
    llMapping
    ugMapping
}
# This portion only runs interactively under Windows (copy/paste)
base <- matrix(scan("geneNMap", what = "", sep = "\t", quote = "", quiet
= TRUE), ncol = 2, byrow = TRUE)
colnames(base) <- c("PROBE", "ACC")
merged <- merge(base, llMapping, by = "PROBE", all.x = TRUE)
merged <- merge(merged, ugMapping, by = "PROBE", all.x = TRUE)
unified <- AnnBuilder:::resolveMaps(merged, trusted = c("EG", "UG"),
srcs = c("EG", "UG"))
unified
read.table(unified, sep = "\t", header = FALSE)

if(.Platform$OS.type != "windows"){
#   these two do not work for me
#   parser(eg) <- file.path(.path.package("AnnBuilder"), "scripts",
"llParser")
#   baseFile(eg) <- unified
    attr(eg, "parser") <- file.path(.path.package("AnnBuilder"),
"scripts", "llParser")
    attr(eg, "baseFile") <- unified
    annotation <- parseData(eg, eg at accession, ncol = 14)
    colnames(annotation) <- c("PROBE", "ACCNUM", "ENTREZID", "UNIGENE",
        "GENENAME", "SYMBOL","CHR", "MAP", "PMID", "GRIF", "SUMFUNC",
"GO",
        "OMIM", "REFSEQ")
}

annotation

gpUrl <- "http://www.bioconductor.org/datafiles/wwwsources/"
goUrl <- "http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml"
gp <- GP(srcUrl = gpUrl, organism = "Homo sapiens", fromWeb = TRUE)
go <- GO(srcUrl = goUrl, fromWeb = TRUE)

strand <- AnnBuilder:::getChroLocation(srcUrl(gp),
AnnBuilder:::gpLinkNGene(TRUE));
strand
annotation <- merge(annotation, strand, by = "ENTREZID", all.x = TRUE);

pkgName <- "myTestPkg"
pkgPath <- getwd()
createEmptyDPkg("myTestPkg", getwd(), force = TRUE)
annotation <- as.matrix(annotation)
annotation
AnnBuilder:::writeAnnData2Pkg(annotation, pkgName, pkgPath)

list.files(file.path(getwd(), "myTestPkg"))

repList <- AnnBuilder:::getRepList("all", srcObjs)
repList[["PKGNAME"]] <- pkgName
AnnBuilder:::writeOrganism(pkgName, pkgPath, "Homo sapiens")
AnnBuilder:::writeDocs("geneNMap", pkgName, pkgPath, "1.1.0",
    list(author = "annonymous", maintainer = "annonymous
<annonymous at net.com>"), repList, "PKGNAME")

# clean up
#unlink(c(unified, XMLOut, "geneNMap", "test.xml", "testByNum.xml"))
#unlink(file.path(getwd(), "test"), TRUE)

sessionInfo()

=======================
This is the session info:

R version 2.5.0 (2007-04-23) 
x86_64-unknown-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-
8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ID
ENTIFICATION=C

attached base packages:
[1] "tools"     "stats"     "graphics"  "grDevices" "utils"
"datasets" 
[7] "methods"   "base"     

other attached packages:
AnnBuilder   annotate        XML    Biobase 
  "1.14.0"   "1.14.1"    "1.7-3"   "1.14.0" 

Best,
Xinxia

-----Original Message-----
From: John Zhang [mailto:jzhang at jimmy.harvard.edu] 
Sent: Tuesday, May 01, 2007 5:51 AM
To: bioconductor at stat.math.ethz.ch; Xinxia Peng
Subject: Re: [BioC] AnnBuilder with custmerized GO annotation

>
>What do these after a GO term mean, '@E' or '@NR'?

They are evidence code by GO. Please read the description available from
GO web site for details.

>
>What I am trying to do is to build an annotation package for GO
>enrichment analysis using GOstats. The GO annotation is from
>InterProScan. I plan to create a data frame with three columns:
probeid,
>geneid and GO, then build the annotation package. Any suggestions?
>
>Thanks,
>Xinxia
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor

Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084