[Bioc-devel] OrganismDb package for Drosophila.melanogaster

Pariksheet Nanda pariksheet.nanda at uconn.edu
Tue Nov 15 05:43:46 CET 2016


Hi folks,

It would be great to have an OrganismDb package for
Drosophila.melanogaster, similar to Homo.sapiens, Mus.musculus and
Rattus.norvegicus.

While trying to do this on my own using the Homo.sapiens package as a
starting point, I found the most similar looking keys to relate
org.Dm.eg.db and TxDb.Dmelanogaster.UCSC.dm6.ensGene to be "ENSEMBL" and
"GENEID" though there's a ".1" tacked to the end "GENEID" which makes it
harder to supply the graphInfo object to OrganismDbi:::.loadOrganismDbiPkg:

!> key_ <- function(db, key) sort(as.character(
 +                                select(db, keys(db, key), key,
key)[[key]]))
 > key_head <- function(db, key) head(key_(db, key))
 > key_head(TxDb.Dmelanogaster.UCSC.dm6.ensGene, "GENEID")
 'select()' returned 1:1 mapping between keys and columns
 [1] "FBgn0000003.1" "FBgn0000008.1" "FBgn0000014.1" "FBgn0000015.1"
 [5] "FBgn0000017.1" "FBgn0000018.1"
 > key_head(org.Dm.eg.db, "ENSEMBL")
 [1] "FBgn0000008" "FBgn0000014" "FBgn0000015" "FBgn0000017" "FBgn0000018"
 [6] "FBgn0000022"
 >

In other words, like Rattus.norvegicus, it might be good do add a UCSC
"refGene" TxDb package for dm6 as "ensGene" doesn't appear to be as good of
a candidate (at least without some ugliness)?  I was looking at creating a
dm6 UCSC "refGene" TxDb.  I imagine one would query the UCSC public MySQL
server and then do the SQLite conversion.  Although the conversion to
SQLite seems a bit finagly as the datatypes differ between MySQL and SQLite
and I'm having a hard time finding a well supported tool to do it; I don't
want to introduce errors or harm reproducibility.  What do you use for
MySQL to SQLite conversion?  Or would it be more sensible for you
benevolent dictators to generate the package(s)?

Pariksheet

---
Pariksheet Nanda
PhD Candidate in Genetics and Genomics
System Administrator, Storrs HPC Cluster
University of Connecticut

---
 > sessionInfo()
 R Under development (unstable) (2016-11-13 r71655)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Ubuntu 16.04.1 LTS

 locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] stats4    parallel  stats     graphics  grDevices utils     datasets
 [8] methods   base

 other attached packages:
  [1] Rattus.norvegicus_1.3.1
  [2] TxDb.Rnorvegicus.UCSC.rn5.refGene_3.4.0
  [3] org.Rn.eg.db_3.4.0
  [4] Mus.musculus_1.3.1
  [5] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
  [6] org.Mm.eg.db_3.4.0
  [7] Homo.sapiens_1.3.1
  [8] GO.db_3.4.0
  [9] OrganismDbi_1.17.1
 [10] org.Hs.eg.db_3.4.0
 [11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [12] org.Dm.eg.db_3.4.0
 [13] TxDb.Dmelanogaster.UCSC.dm6.ensGene_3.3.0
 [14] GenomicFeatures_1.27.2
 [15] AnnotationDbi_1.37.0
 [16] Biobase_2.35.0
 [17] GenomicRanges_1.27.6
 [18] GenomeInfoDb_1.11.4
 [19] IRanges_2.9.8
 [20] S4Vectors_0.13.2
 [21] BiocGenerics_0.21.0
 [22] BiocInstaller_1.25.2

 loaded via a namespace (and not attached):
  [1] compiler_3.4.0             XVector_0.15.0
  [3] bitops_1.0-6               tools_3.4.0
  [5] zlibbioc_1.21.0            biomaRt_2.31.1
  [7] RSQLite_1.0.0              lattice_0.20-34
  [9] Matrix_1.2-7.1             graph_1.53.0
 [11] DBI_0.5-1                  rtracklayer_1.35.1
 [13] Biostrings_2.43.0          grid_3.4.0
 [15] XML_3.98-1.5               RBGL_1.51.0
 [17] BiocParallel_1.9.1         Rsamtools_1.27.2
 [19] GenomicAlignments_1.11.1   SummarizedExperiment_1.5.3
 [21] RCurl_1.95-4.8
 >

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list