[BioC] ACCNUM match zero using AnnotationDBI package

Wed Oct 15 19:01:14 CEST 2008

Hi Shiliang,

The SQLForge codebase matches things up to each other based on a range
of different types of gene IDs, so depending on what you feed in, you
can have no ACCNUMs and everything might still be fine.  The ACCNUM
mapping is a special case because it is meant to store the accessions
that were used to make the probes used in the package, and not simply
list all possible genbank accesssions that map to a particular gene, so
if you don't tell SQLForge about any ACCNUMs when you make the package,
it won't put them in because we don't want to make assumptions when
creating these packages.  For the package you made, you only used an
entrez gene mapping, and did not feed in any genbank accessions, so
SQLForge has no way to know whether or not those accessions should be
associated with your probes or not. 

  Marc

swang wrote:
> HI, Nianhua:
>
> I am using annotationDBI and it is much better than AnnBuilder ( I used to
> use that package).
> I found the package I built recently with annotationDBI has ACCNUM match
> zero. Here is what I did:
>
> 1. My code:
>   source("http://bioconductor.org/biocLite.R")
>   biocLite("mouse.db0")
>
>
>   makeMOUSECHIP_DB(affy=FALSE,
>   prefix="Rosetta",
>   fileName='Rosettabasefile.txt',
>   baseMapType="eg",
>   outputDir = getwd(),
>   version="3.0.0",
>   manufacturer = "Rosetta",
>   chipName = "Mouse custom Array",
>   manufacturerUrl = "http://www.rii.com/")
>
> 2. My base file (example):
>    10024408304           NA  10024412833 78124  10024395853 50766
> 10024401691 327766  10024407521        NA  10024397162 192651  10024402992
> 216395  10024414142 69334  10024410918 105203  10024410918 105203
> 10024416230 19159  10024416583 22312 I noticed that I have duplicates in
> both column.
>
> 3. the information I got:
>
> 4. Can you notice that I have RosettaACCNUM match zero? how does this
> happen?
>
> Thanks
>
> Shiliang
>
>
> R version 2.7.2 (2008-08-25)
> Copyright (C) 2008 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>   Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>   
>> library(Rosetta.db)
>>     
> Loading required package: AnnotationDbi
> Loading required package: Biobase
> Loading required package: tools
>
> Welcome to Bioconductor
>
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> Loading required package: DBI
> Loading required package: RSQLite
>   
>> Rosetta()
>>     
> Quality control information for Rosetta:
>
>
> This package has the following mappings:
>
> RosettaACCNUM has 0 mapped keys (of 23574 keys)
> RosettaALIAS2PROBE has 65400 mapped keys (of 65400 keys)
> RosettaCHR has 19189 mapped keys (of 23574 keys)
> RosettaCHRLENGTHS has 21 mapped keys (of 21 keys)
> RosettaCHRLOC has 17558 mapped keys (of 23574 keys)
> RosettaENSEMBL has 18414 mapped keys (of 23574 keys)
> RosettaENSEMBL2PROBE has 17096 mapped keys (of 17096 keys)
> RosettaENTREZID has 19207 mapped keys (of 23574 keys)
> RosettaENZYME has 1922 mapped keys (of 23574 keys)
> RosettaENZYME2PROBE has 791 mapped keys (of 791 keys)
> RosettaGENENAME has 19207 mapped keys (of 23574 keys)
> RosettaGO has 16380 mapped keys (of 23574 keys)
> RosettaGO2ALLPROBES has 8447 mapped keys (of 8447 keys)
> RosettaGO2PROBE has 6152 mapped keys (of 6152 keys)
> RosettaMAP has 18233 mapped keys (of 23574 keys)
> RosettaMGI has 19053 mapped keys (of 23574 keys)
> RosettaMGI2PROBE has 17459 mapped keys (of 17459 keys)
> RosettaPATH has 4072 mapped keys (of 23574 keys)
> RosettaPATH2PROBE has 195 mapped keys (of 195 keys)
> RosettaPFAM has 18761 mapped keys (of 23574 keys)
> RosettaPMID has 19063 mapped keys (of 23574 keys)
> RosettaPMID2PROBE has 114988 mapped keys (of 114988 keys)
> RosettaPROSITE has 18761 mapped keys (of 23574 keys)
> RosettaREFSEQ has 18781 mapped keys (of 23574 keys)
> RosettaSYMBOL has 19207 mapped keys (of 23574 keys)
> RosettaUNIGENE has 18950 mapped keys (of 23574 keys)
>
>
> Additional Information about this package:
>
> DB schema: MOUSECHIP_DB
> DB schema version: 1.0
> Organism: Mus musculus
> Date for NCBI data: 2008-Apr2
> Date for GO data: 200803
> Date for KEGG data: 2008-Apr1
> Date for Golden Path data: 2007-Aug22
> Date for IPI data: 2008-Mar19
>   
>> sessionInfo()
>>     
> R version 2.7.2 (2008-08-25)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] Rosetta.db_3.0.0    AnnotationDbi_1.2.2 RSQLite_0.7-0
> DBI_0.2-4           Biobase_2.0.1
>   
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>