[BioC] AnnBuilder package: problem with gbNRef
Nianhua Li
nli at fhcrc.org
Wed Jul 26 22:05:46 CEST 2006
Hi, Cameron,
Maybe you want to try baseType="refseq". I used the sample baseFile from
your
email with this script:
==================================================================
library(AnnBuilder)
mySrcUrls <- getSrcUrl("all", "Homo sapiens")
mySrcUrls[[7]]<- "file:///home/cameron/microarray_data/annotate"
mypkg <- function(pkgPath, version) {
ABPkgBuilder(baseName="mybase.txt",
baseMapType="refseq",
srcUrls=mySrcUrls,
pkgName="mypkg",
pkgPath=pkgPath,
organism="Homo sapiens",
version=version,
author=list(
authors="R. Cameron Craddock",
maintainer="R. Cameron Craddock <email at email.email>"
)
)
}
mypkg(getwd(), "1.0.0")
==================================================================
And here is the result:
==================================================================
>ibrary(mypkg)
>mypkg()
Quality control information for mypkg
Date built: Created: Wed Jul 26 12:18:11 2006
Number of probes: 22
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
mypkgACCNUM found 21 of 22
mypkgCHRLOC found 20 of 22
mypkgCHR found 20 of 22
mypkgENZYME found 0 of 22
mypkgGENENAME found 20 of 22
mypkgGO found 17 of 22
mypkgLOCUSID found 20 of 22
mypkgMAP found 19 of 22
mypkgOMIM found 18 of 22
mypkgPATH found 5 of 22
mypkgPMID found 20 of 22
mypkgREFSEQ found 20 of 22
mypkgSUMFUNC found 0 of 22
mypkgSYMBOL found 20 of 22
mypkgUNIGENE found 20 of 22
Mappings found for non-probe based rda files:
mypkgCHRLENGTHS found 25
mypkgGO2ALLPROBES found 269
mypkgGO2PROBE found 73
mypkgORGANISM found 1
mypkgPATH2PROBE found 17
mypkgPFAM found 15
mypkgPMID2PROBE found 595
mypkgPROSITE found 13
========================================================
What AnnBuilder does for your inputs is:
(1) Use your "mixture of GenBank Accession and Ref Seq" to find the
Entrez Gene ID
(2) Use the Entrez Gene ID to find other annotations.
If your base type is "gbNRef", it use
ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz for GB to EZ mapping.
If your
base type is "refseq", it use
ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz for
mapping. You may want to check those files manually to see whether all your
input IDs are included. If your input has mix ID types, then you have to get
Entrez Gene IDs manually.
hope it helps
nianhua
More information about the Bioconductor
mailing list