[BioC] AnnBuilder with custmerized GO annotation
Xinxia Peng
xinxia.peng at sbri.org
Tue May 1 19:04:25 CEST 2007
Thanks. I thought about it, but I did not find the 'E' in this example
from GO website:
http://www.geneontology.org/GO.evidence.shtml
Another question is for GOstats: is the evidence code involved in the
enrichment analysis?
=======================
Here I repeat part of the annotation that I got:
ENTREZID PROBE ACCNUM UNIGENE
[1,] "10" "36512_at" "L32179" "Hs.2"
[2,] "10" "38912_at" "D90042" "Hs.2"
[3,] "1084" "32468_f_at" "D90278;M16652" "NA"
[4,] "125" "35730_at" "X03350" "NA"
[5,] "2" "32469_at" "L00693" "Hs.74561"
[6,] "63036" "38936_at" "M16652" "NA"
[7,] "7051" "32481_at" "AL031663" "NA"
[8,] "9" "33825_at" "X68733" "NA"
[9,] "NA" "39368_at" "AL031668" "NA"
GO OMIM
[1,] "GO:0004060 at E" "NA"
[2,] "GO:0004060 at E" "NA"
[3,] "NA" "NA"
[4,] "NA" "NA"
[5,] "GO:0008320 at NR;GO:0004866 at NR;GO:0006886 at NR" "NA"
[6,] "NA" "NA"
[7,] "NA" "NA"
[8,] "GO:0004060 at E" "NA"
[9,] "NA" "NA"
=======================
Here is the script that I used while following the example:
library(AnnBuilder);
pkgpath <- .find.package("AnnBuilder");
# test dataset
pkgdir <- "/nethome/xpeng/linux/analysis/array/scripts/pkgs";
setwd(pkgdir);
geneNMap <- matrix(c("32468_f_at", "D90278;M16652", "32469_at",
"L00693",
"32481_at", "AL031663", "33825_at", "X68733",
"35730_at", "X03350", "36512_at", "L32179",
"38912_at", "D90042", "38936_at", "M16652",
"39368_at", "AL031668"), ncol = 2, byrow = TRUE)
write.table(geneNMap, file = "geneNMap", sep = "\t", quote = FALSE,
row.names = FALSE, col.names = FALSE)
# get annotation info.
makeSrcInfo()
srcObjs <- list()
egUrl <- "http://www.bioconductor.org/datafiles/wwwsources"
ugUrl <- "http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz"
eg <- EG(srcUrl = egUrl, parser = file.path(pkgpath, "scripts",
"gbLLParser"), baseFile = "geneNMap", accession = "Tll_tmpl.gz",
built = "N/A", fromWeb = TRUE)
ug <- UG(srcUrl = ugUrl, parser = file.path(pkgpath,
"scripts", "gbUGParser"), baseFile = "geneNMap",
organism = "Homo sapiens", built = "N/A", fromWeb = TRUE)
srcObjs[["eg"]] <- eg
srcObjs[["ug"]] <- ug
if(.Platform$OS.type != "windows"){
llMapping <- parseData(eg, eg at accession)
colnames(llMapping) <- c("PROBE", "EG")
ugMapping <- parseData(ug)
colnames(ugMapping) <- c("PROBE", "UG")
}
# This portion only runs after the previous code has been
# executed under windows
if(.Platform$OS.type != "windows"){
llMapping
ugMapping
}
# This portion only runs interactively under Windows (copy/paste)
base <- matrix(scan("geneNMap", what = "", sep = "\t", quote = "", quiet
= TRUE), ncol = 2, byrow = TRUE)
colnames(base) <- c("PROBE", "ACC")
merged <- merge(base, llMapping, by = "PROBE", all.x = TRUE)
merged <- merge(merged, ugMapping, by = "PROBE", all.x = TRUE)
unified <- AnnBuilder:::resolveMaps(merged, trusted = c("EG", "UG"),
srcs = c("EG", "UG"))
unified
read.table(unified, sep = "\t", header = FALSE)
if(.Platform$OS.type != "windows"){
# these two do not work for me
# parser(eg) <- file.path(.path.package("AnnBuilder"), "scripts",
"llParser")
# baseFile(eg) <- unified
attr(eg, "parser") <- file.path(.path.package("AnnBuilder"),
"scripts", "llParser")
attr(eg, "baseFile") <- unified
annotation <- parseData(eg, eg at accession, ncol = 14)
colnames(annotation) <- c("PROBE", "ACCNUM", "ENTREZID", "UNIGENE",
"GENENAME", "SYMBOL","CHR", "MAP", "PMID", "GRIF", "SUMFUNC",
"GO",
"OMIM", "REFSEQ")
}
annotation
gpUrl <- "http://www.bioconductor.org/datafiles/wwwsources/"
goUrl <- "http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml"
gp <- GP(srcUrl = gpUrl, organism = "Homo sapiens", fromWeb = TRUE)
go <- GO(srcUrl = goUrl, fromWeb = TRUE)
strand <- AnnBuilder:::getChroLocation(srcUrl(gp),
AnnBuilder:::gpLinkNGene(TRUE));
strand
annotation <- merge(annotation, strand, by = "ENTREZID", all.x = TRUE);
pkgName <- "myTestPkg"
pkgPath <- getwd()
createEmptyDPkg("myTestPkg", getwd(), force = TRUE)
annotation <- as.matrix(annotation)
annotation
AnnBuilder:::writeAnnData2Pkg(annotation, pkgName, pkgPath)
list.files(file.path(getwd(), "myTestPkg"))
repList <- AnnBuilder:::getRepList("all", srcObjs)
repList[["PKGNAME"]] <- pkgName
AnnBuilder:::writeOrganism(pkgName, pkgPath, "Homo sapiens")
AnnBuilder:::writeDocs("geneNMap", pkgName, pkgPath, "1.1.0",
list(author = "annonymous", maintainer = "annonymous
<annonymous at net.com>"), repList, "PKGNAME")
# clean up
#unlink(c(unified, XMLOut, "geneNMap", "test.xml", "testByNum.xml"))
#unlink(file.path(getwd(), "test"), TRUE)
sessionInfo()
=======================
This is the session info:
R version 2.5.0 (2007-04-23)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U
TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-
8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ID
ENTIFICATION=C
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils"
"datasets"
[7] "methods" "base"
other attached packages:
AnnBuilder annotate XML Biobase
"1.14.0" "1.14.1" "1.7-3" "1.14.0"
Best,
Xinxia
-----Original Message-----
From: John Zhang [mailto:jzhang at jimmy.harvard.edu]
Sent: Tuesday, May 01, 2007 5:51 AM
To: bioconductor at stat.math.ethz.ch; Xinxia Peng
Subject: Re: [BioC] AnnBuilder with custmerized GO annotation
>
>What do these after a GO term mean, '@E' or '@NR'?
They are evidence code by GO. Please read the description available from
GO web site for details.
>
>What I am trying to do is to build an annotation package for GO
>enrichment analysis using GOstats. The GO annotation is from
>InterProScan. I plan to create a data frame with three columns:
probeid,
>geneid and GO, then build the annotation package. Any suggestions?
>
>Thanks,
>Xinxia
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
Jianhua Zhang
Department of Medical Oncology
Dana-Farber Cancer Institute
44 Binney Street
Boston, MA 02115-6084
More information about the Bioconductor
mailing list