[Bioc-devel] OrganismDb package for Drosophila.melanogaster

Martin Morgan martin.morgan at roswellpark.org
Tue Nov 15 20:34:17 CET 2016


On 11/15/2016 09:52 AM, Obenchain, Valerie wrote:
> Hi Pariksheet,
>
> On 11/15/2016 03:32 AM, Pariksheet Nanda wrote:
>> Hi folks,
>>
>> It would be great to have an OrganismDb package for
>> Drosophila.melanogaster, similar to Homo.sapiens, Mus.musculus and
>> Rattus.norvegicus.
>>
>> While trying to do this on my own using the Homo.sapiens package as a
>> starting point, I found the most similar looking keys to relate
>> org.Dm.eg.db and TxDb.Dmelanogaster.UCSC.dm6.ensGene to be "ENSEMBL" and
>> "GENEID" though there's a ".1" tacked to the end "GENEID" which makes it
>> harder to supply the graphInfo object to OrganismDbi:::.loadOrganismDbiPkg:
>>
>> !> key_ <- function(db, key) sort(as.character(
>>  +                                select(db, keys(db, key), key,
>> key)[[key]]))
>>  > key_head <- function(db, key) head(key_(db, key))
>>  > key_head(TxDb.Dmelanogaster.UCSC.dm6.ensGene, "GENEID")
>>  'select()' returned 1:1 mapping between keys and columns
>>  [1] "FBgn0000003.1" "FBgn0000008.1" "FBgn0000014.1" "FBgn0000015.1"
>>  [5] "FBgn0000017.1" "FBgn0000018.1"
>>  > key_head(org.Dm.eg.db, "ENSEMBL")
>>  [1] "FBgn0000008" "FBgn0000014" "FBgn0000015" "FBgn0000017" "FBgn0000018"
>>  [6] "FBgn0000022"
>>  >
>>
>> In other words, like Rattus.norvegicus, it might be good do add a UCSC
>> "refGene" TxDb package for dm6 as "ensGene" doesn't appear to be as good of
>> a candidate (at least without some ugliness)?  I was looking at creating a
>> dm6 UCSC "refGene" TxDb.
> You can use GenomicFeatures::makeTxDbFromUCSC() to create the TxDb. The
> man page, ?makeTxDbFromUCSC, also has helper functions that display
> available genomes, tables and tracks.

I'm not completely sure of the result, but

library(OrganismDb)
odb = makeOrganismDbFromUCSC("dm6", tableName="refGene")

might be most of the way there?

Martin

>
> Valerie
>
>> I imagine one would query the UCSC public MySQL
>> server and then do the SQLite conversion.  Although the conversion to
>> SQLite seems a bit finagly as the datatypes differ between MySQL and SQLite
>> and I'm having a hard time finding a well supported tool to do it; I don't
>> want to introduce errors or harm reproducibility.  What do you use for
>> MySQL to SQLite conversion?  Or would it be more sensible for you
>> benevolent dictators to generate the package(s)?
>>
>> Pariksheet
>>
>> ---
>> Pariksheet Nanda
>> PhD Candidate in Genetics and Genomics
>> System Administrator, Storrs HPC Cluster
>> University of Connecticut
>>
>> ---
>>  > sessionInfo()
>>  R Under development (unstable) (2016-11-13 r71655)
>>  Platform: x86_64-pc-linux-gnu (64-bit)
>>  Running under: Ubuntu 16.04.1 LTS
>>
>>  locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>  attached base packages:
>>  [1] stats4    parallel  stats     graphics  grDevices utils     datasets
>>  [8] methods   base
>>
>>  other attached packages:
>>   [1] Rattus.norvegicus_1.3.1
>>   [2] TxDb.Rnorvegicus.UCSC.rn5.refGene_3.4.0
>>   [3] org.Rn.eg.db_3.4.0
>>   [4] Mus.musculus_1.3.1
>>   [5] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
>>   [6] org.Mm.eg.db_3.4.0
>>   [7] Homo.sapiens_1.3.1
>>   [8] GO.db_3.4.0
>>   [9] OrganismDbi_1.17.1
>>  [10] org.Hs.eg.db_3.4.0
>>  [11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
>>  [12] org.Dm.eg.db_3.4.0
>>  [13] TxDb.Dmelanogaster.UCSC.dm6.ensGene_3.3.0
>>  [14] GenomicFeatures_1.27.2
>>  [15] AnnotationDbi_1.37.0
>>  [16] Biobase_2.35.0
>>  [17] GenomicRanges_1.27.6
>>  [18] GenomeInfoDb_1.11.4
>>  [19] IRanges_2.9.8
>>  [20] S4Vectors_0.13.2
>>  [21] BiocGenerics_0.21.0
>>  [22] BiocInstaller_1.25.2
>>
>>  loaded via a namespace (and not attached):
>>   [1] compiler_3.4.0             XVector_0.15.0
>>   [3] bitops_1.0-6               tools_3.4.0
>>   [5] zlibbioc_1.21.0            biomaRt_2.31.1
>>   [7] RSQLite_1.0.0              lattice_0.20-34
>>   [9] Matrix_1.2-7.1             graph_1.53.0
>>  [11] DBI_0.5-1                  rtracklayer_1.35.1
>>  [13] Biostrings_2.43.0          grid_3.4.0
>>  [15] XML_3.98-1.5               RBGL_1.51.0
>>  [17] BiocParallel_1.9.1         Rsamtools_1.27.2
>>  [19] GenomicAlignments_1.11.1   SummarizedExperiment_1.5.3
>>  [21] RCurl_1.95-4.8
>>  >
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
>
> This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list