[BioC] problem with makeOrgPackageFromNCBI (for Chinese hamster)
Marc Carlson
mcarlson at fhcrc.org
Fri Aug 23 02:32:06 CEST 2013
Hi Guido,
I have (so far) been unable to reproduce your initial issue here. I have
no issues generating this package with either release or devel. But
even though I can't use your package directly myself, I am almost
certain that your package is actually just fine, and that the only
reason is says FALSE is because of the 2nd warning given (R will say
FALSE when you call file.remove and it can't actually remove
something). Now the 1st warning just means that you don't have any
unigene data (and that's actually good in this case, since there are no
unigenes for this critter). While the 2nd warning has to do with R
feeling it is not allowed to remove the generated .sqlite file after
copying it into the new package directory. I don't know why that 2nd
warning is happening on Windows and I plan to investigate it, but the
crucial thing is that this happens AFTER it has already generated the
package.
Looking down a bit farther you did find a problem with the
org.Cgriseus.eg() function. Now I think that is a real bug (not a
serious one, but one I intend to look into shortly), with the
org.Cgriseus.eg() function. Basically your package does not have (and
should not have) a org.Cgriseus.egREFSEQ2EG mapping, and yet this silly
function is trying to ask about it. But that is not actually a problem
that exists within your package since the offending code for that
actually lives in AnnotationDbi.
Now you're correct that your package does have the data that could be
used for the org.Cgriseus.egREFSEQ2EG mapping, and that this data is
exposed via the select method(). It is also available via the
org.Cgriseus.egREFSEQ mapping. But it is still not supposed to have
that specific reverse mapping (and it also does not need it since you
have a revmap() method). In fact, none of the old mappings are really
needed for anything. We just generated a few of them for the purposes
of maintaining some backwards compatibility. And to answer your other
question the package is actually "made" by just putting the database
into the inst/exdata of a very minimalist package template found in
AnnotationForge (you can look at in in
inst/AnnDbPkg-templates/ORGANISM.DB/ if you want to see it). The
template is altered slightly based on some inputs that are generated
from your initial arguments so that the manual pages etc. are all
matched to the source material. So really, the most complicated thing
that happens (after the database is made) is actually just generating
all the manual pages.
If you could send me a tarball for the package that you generated, I
would like to look at it and verify that there are not any peculiarities
with it compared to the one that I made here.
Marc
On 08/22/2013 12:33 PM, Hooiveld, Guido wrote:
> Hi Marc and others,
>
> I am using makeOrgPackageFromNCBI() to create an annotation package for Chinese hamster (Cricetulus griseus), but experience some problems during this process. Please see code below for details. It could be very well that I miss something obvious, so any suggestion what may cause this would be appreciated!
>
> Thanks,
> Guido
>
>
> 1) I am using R on Win7, have admin rights, and also start R through 'Run as administrator'. Why can the file 'org.Cgriseus.eg.sqlite' then not be removed? (Reason 'Permission denied'). Note: I understand this is just a warning but it may be relevant.
>
> 2a) Despite no *.db package was produced, I still tried to install the database from the directory the files were generated (i.e. D:\\org.Cgriseus.eg.db). This *seemed* to go OK, but when I check they number of mapped egids it failed at the org.Cgriseus.egREFSEQ mapping...
> 2b) Interestingly, when I manually load the sqlite database (that could not be removed) these org.Cgriseus.egREFSEQ mappings are present! See code at bottom.
> 2c) --> How to make a *.db from an *.sqlite?
>
>
> # Create db0 for Chinese hamster using makeOrgPackageFromNCBI()
>> library(AnnotationForge)
>> makeOrgPackageFromNCBI(
> + version="0.1",
> + maintainer="Guido Hooiveld <guido.hooiveld at wur.nl>",
> + author="Guido Hooiveld <guido.hooiveld at wur.nl>",
> + outputDir=".",
> + tax_id=10029,
> + genus="Cricetulus",
> + species="griseus")
> Loading required package: GO.db
>
> Getting data for gene2pubmed.gz
> Loading required package: RCurl
> Loading required package: bitops
> discarding data from other organisms
> Populating gene2pubmed table:
> table gene2pubmed filled
> Getting data for gene2accession.gz
> discarding data from other organisms
> Populating gene2accession table:
> table gene2accession filled
> Getting data for gene2refseq.gz
> discarding data from other organisms
> Populating gene2refseq table:
> table gene2refseq filled
> Getting data for gene2unigene
> discarding data from other organisms
> Populating gene2unigene table:
> table gene2unigene filled
> Getting data for gene_info.gz
> discarding data from other organisms
> Populating gene_info table:
> table gene_info filled
> Getting data for gene2go.gz
> discarding data from other organisms
> Populating gene2go table:
> Getting blast2GO data as a substitute for gene2go
> table metadata filled
> table map_metadata filled
> table gene2go filled
> table metadata filled
> table map_metadata filled
> Populating genes table:
> genes table filled
> Populating gene_info_temp table:
> gene_info_temp table filled
> Populating alias table:
> alias table filled
> Populating chromosomes table:
> chromosomes table filled
> Populating pubmed table:
> pubmed table filled
> Populating refseq table:
> refseq table filled
> Populating accessions table:
> accessions table filled
> Populating unigene table:
> Dropping GO IDs that are too new for the current GO.db
> Dropping GO IDs that are too new for the current GO.db
> Dropping GO IDs that are too new for the current GO.db
> Populating go_bp table:
> go_bp table filled
> Populating go_mf table:
> go_mf table filled
> Populating go_cc table:
> go_cc table filled
> Populating go_bp_all table:
> go_bp_all table filled
> Populating go_mf_all table:
> go_mf_all table filled
> Populating go_cc_all table:
> go_cc_all table filled
> dropping table gene2pubmeddropping table gene2accessiondropping table gene2refseqdropping table gene2unigenedropping table gene_infodropping table gene2go
> Making GO views
>
>
> SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.gene_name NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL
> SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE t._id=g._id AND t.symbol NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g WHERE t._id=g._id AND t.chromosome NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL
> SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL
> SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE t._id=g._id AND t.unigene_id NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL
> SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g WHERE t._id=g._id AND t.accession NOT NULL
> SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE t._id=g._id AND t.alias_symbol NOT NULL
> table map_counts filled
> Creating package in ./org.Cgriseus.eg.db
> [1] FALSE
> Warning messages:
> 1: In .makeSimpleTable(ug, table = "unigene", con) :
> no values found for table unigene in this data chunk.
> 2: In file.remove(dbfile) :
> cannot remove file 'org.Cgriseus.eg.sqlite', reason 'Permission denied'
>> # Now manually install files from DIR that has been generated.
>>
>> install.packages(repos=NULL, pkgs="D:\\org.Cgriseus.eg.db", type="source")
> * installing *source* package 'org.Cgriseus.eg.db' ...
> ** R
> ** inst
> ** preparing package for lazy loading
> ** help
> *** installing help indices
> ** building package indices
> ** testing if installed package can be loaded
> *** arch - i386
> *** arch - x64
> * DONE (org.Cgriseus.eg.db)
>> library(org.Cgriseus.eg.db)
>> org.Cgriseus.eg()
> Quality control information for org.Cgriseus.eg:
>
>
> This package has the following mappings:
>
> org.Cgriseus.egALIAS2EG has 25227 mapped keys (of 25227 keys)
> org.Cgriseus.egCHR has 25227 mapped keys (of 25227 keys)
> org.Cgriseus.egGENENAME has 25227 mapped keys (of 25227 keys)
> org.Cgriseus.egGO has 25227 mapped keys (of 25227 keys)
> org.Cgriseus.egGO2ALLEGS has 25227 mapped keys (of 16020 keys)
> org.Cgriseus.egGO2EG has 25227 mapped keys (of 12124 keys)
> org.Cgriseus.egREFSEQ has 25227 mapped keys (of 25227 keys)
> Error in get(mapname) : object 'org.Cgriseus.egREFSEQ2EG' not found
>>
>
>
>> #load sqlite to check that REFSEQ mappings are included
>> CHO.db <- loadDb("org.Cgriseus.eg.sqlite")
>> CHO.db
> OrgDb object:
> | BL2GOSOURCEDATE: Thu Aug 22 18:47:20 2013
> | BL2GOSOURCENAME: blast2GO
> | BL2GOSOURCEURL: http://www.blast2go.de/
> | DBSCHEMAVERSION: 2.1
> | DBSCHEMA: ORGANISM_DB
> | ORGANISM: Cricetulus griseus
> | SPECIES: Cricetulus griseus
> | CENTRALID: EG
> | TAXID: 10029
> | EGSOURCEDATE: Thu Aug 22 18:47:24 2013
> | EGSOURCENAME: Entrez Gene
> | EGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | GOSOURCEDATE: 20130302
> | GOSOURCENAME: Gene Ontology
> | GOSOURCEURL: ftp://ftp.geneontology.org/pub/go/godata
> | GOEGSOURCEDATE: Thu Aug 22 18:47:24 2013
> | GOEGSOURCENAME: Entrez Gene
> | GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
> | Db type: OrgDb
> | Supporting package: AnnotationDbi
>
>> cols(CHO.db)
> [1] "ENTREZID" "ACCNUM" "ALIAS" "CHR" "PMID" "REFSEQ"
> [7] "SYMBOL" "UNIGENE" "GENENAME" "GO" "EVIDENCE" "ONTOLOGY"
>> keys <- head( keys(CHO.db))
>> keys
> [1] "100682525" "100682526" "100682527" "100682528" "100682529" "100682530"
>> select(CHO.db, keys=keys, cols = c("SYMBOL","REFSEQ","UNIGENE"))
> ENTREZID SYMBOL REFSEQ UNIGENE
> 1 100682525 P53 NM_001243976 <NA>
> 2 100682525 P53 NP_001230905 <NA>
> 3 100682526 Tuba1c NM_001243977 <NA>
> 4 100682526 Tuba1c NP_001230906 <NA>
> 5 100682527 Tuba1a NM_001243978 <NA>
> 6 100682527 Tuba1a NP_001230907 <NA>
> 7 100682528 Tuba1b NM_001243979 <NA>
> 8 100682528 Tuba1b NP_001230908 <NA>
> 9 100682529 Mgat1 NM_001243980 <NA>
> 10 100682529 Mgat1 NP_001230909 <NA>
> 11 100682530 Plec XM_003507629 <NA>
> 12 100682530 Plec XP_003507677 <NA>
> Warning message:
> In .generateExtraRows(tab, keys, jointype) :
> 'select' resulted in 1:many mapping between keys and return rows
>> sessionInfo()
> R version 3.0.1 Patched (2013-06-05 r62877)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] org.Cgriseus.eg.db_0.1 RCurl_1.95-4.1 bitops_1.0-6 GO.db_2.9.0
> [5] AnnotationForge_1.2.2 org.Hs.eg.db_2.9.0 RSQLite_0.11.4 DBI_0.2-7
> [9] AnnotationDbi_1.22.6 Biobase_2.20.1 BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.18.3 stats4_3.0.1 tools_3.0.1
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list