[BioC] AnnBuilder package: problem with gbNRef

Nianhua Li nli at fhcrc.org
Wed Jul 26 22:05:46 CEST 2006

Hi, Cameron,

Maybe you want to try baseType="refseq". I used the sample baseFile from 
email with this script:
mySrcUrls <- getSrcUrl("all", "Homo sapiens")
mySrcUrls[[7]]<- "file:///home/cameron/microarray_data/annotate"
mypkg <- function(pkgPath, version) {
                 organism="Homo sapiens",
                   authors="R. Cameron Craddock",
                   maintainer="R. Cameron Craddock <email at email.email>"
mypkg(getwd(), "1.0.0")

And here is the result:

Quality control information for  mypkg
Date built: Created: Wed Jul 26 12:18:11 2006

Number of probes: 22
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
         mypkgACCNUM found 21 of 22
         mypkgCHRLOC found 20 of 22
         mypkgCHR found 20 of 22
         mypkgENZYME found 0 of 22
         mypkgGENENAME found 20 of 22
         mypkgGO found 17 of 22
         mypkgLOCUSID found 20 of 22
         mypkgMAP found 19 of 22
         mypkgOMIM found 18 of 22
         mypkgPATH found 5 of 22
         mypkgPMID found 20 of 22
         mypkgREFSEQ found 20 of 22
         mypkgSUMFUNC found 0 of 22
         mypkgSYMBOL found 20 of 22
         mypkgUNIGENE found 20 of 22
Mappings found for non-probe based rda files:
         mypkgCHRLENGTHS found 25
         mypkgGO2ALLPROBES found 269
         mypkgGO2PROBE found 73
         mypkgORGANISM found 1
         mypkgPATH2PROBE found 17
         mypkgPFAM found 15
         mypkgPMID2PROBE found 595
         mypkgPROSITE found 13

What AnnBuilder does for your inputs is:
(1) Use your "mixture of GenBank Accession and Ref Seq" to find the 
Entrez Gene ID
(2) Use the Entrez Gene ID to find other annotations.

If your base type is "gbNRef", it use
ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz for GB to EZ mapping. 
If your
base type is "refseq", it use 
ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz for
mapping. You may want to check those files manually to see whether all your
input IDs are included. If your input has mix ID types, then you have to get
Entrez Gene IDs manually.

hope it helps


More information about the Bioconductor mailing list