[Bioc-devel] OrganismDb package for Drosophila.melanogaster

Martin Morgan martin.morgan at roswellpark.org
Tue Nov 15 20:35:38 CET 2016


On 11/15/2016 02:34 PM, Martin Morgan wrote:
> On 11/15/2016 09:52 AM, Obenchain, Valerie wrote:
>> Hi Pariksheet,
>>
>> On 11/15/2016 03:32 AM, Pariksheet Nanda wrote:
>>> Hi folks,
>>>
>>> It would be great to have an OrganismDb package for
>>> Drosophila.melanogaster, similar to Homo.sapiens, Mus.musculus and
>>> Rattus.norvegicus.
>>>
>>> While trying to do this on my own using the Homo.sapiens package as a
>>> starting point, I found the most similar looking keys to relate
>>> org.Dm.eg.db and TxDb.Dmelanogaster.UCSC.dm6.ensGene to be "ENSEMBL" and
>>> "GENEID" though there's a ".1" tacked to the end "GENEID" which makes it
>>> harder to supply the graphInfo object to
>>> OrganismDbi:::.loadOrganismDbiPkg:
>>>
>>> !> key_ <- function(db, key) sort(as.character(
>>>  +                                select(db, keys(db, key), key,
>>> key)[[key]]))
>>>  > key_head <- function(db, key) head(key_(db, key))
>>>  > key_head(TxDb.Dmelanogaster.UCSC.dm6.ensGene, "GENEID")
>>>  'select()' returned 1:1 mapping between keys and columns
>>>  [1] "FBgn0000003.1" "FBgn0000008.1" "FBgn0000014.1" "FBgn0000015.1"
>>>  [5] "FBgn0000017.1" "FBgn0000018.1"
>>>  > key_head(org.Dm.eg.db, "ENSEMBL")
>>>  [1] "FBgn0000008" "FBgn0000014" "FBgn0000015" "FBgn0000017"
>>> "FBgn0000018"
>>>  [6] "FBgn0000022"
>>>  >
>>>
>>> In other words, like Rattus.norvegicus, it might be good do add a UCSC
>>> "refGene" TxDb package for dm6 as "ensGene" doesn't appear to be as
>>> good of
>>> a candidate (at least without some ugliness)?  I was looking at
>>> creating a
>>> dm6 UCSC "refGene" TxDb.
>> You can use GenomicFeatures::makeTxDbFromUCSC() to create the TxDb. The
>> man page, ?makeTxDbFromUCSC, also has helper functions that display
>> available genomes, tables and tracks.
>
> I'm not completely sure of the result, but
>
> library(OrganismDb)
> odb = makeOrganismDbFromUCSC("dm6", tableName="refGene")
                                     ^^tablename
>
> might be most of the way there?
>
> Martin
>
>>
>> Valerie
>>
>>> I imagine one would query the UCSC public MySQL
>>> server and then do the SQLite conversion.  Although the conversion to
>>> SQLite seems a bit finagly as the datatypes differ between MySQL and
>>> SQLite
>>> and I'm having a hard time finding a well supported tool to do it; I
>>> don't
>>> want to introduce errors or harm reproducibility.  What do you use for
>>> MySQL to SQLite conversion?  Or would it be more sensible for you
>>> benevolent dictators to generate the package(s)?
>>>
>>> Pariksheet
>>>
>>> ---
>>> Pariksheet Nanda
>>> PhD Candidate in Genetics and Genomics
>>> System Administrator, Storrs HPC Cluster
>>> University of Connecticut
>>>
>>> ---
>>>  > sessionInfo()
>>>  R Under development (unstable) (2016-11-13 r71655)
>>>  Platform: x86_64-pc-linux-gnu (64-bit)
>>>  Running under: Ubuntu 16.04.1 LTS
>>>
>>>  locale:
>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>>  attached base packages:
>>>  [1] stats4    parallel  stats     graphics  grDevices utils
>>> datasets
>>>  [8] methods   base
>>>
>>>  other attached packages:
>>>   [1] Rattus.norvegicus_1.3.1
>>>   [2] TxDb.Rnorvegicus.UCSC.rn5.refGene_3.4.0
>>>   [3] org.Rn.eg.db_3.4.0
>>>   [4] Mus.musculus_1.3.1
>>>   [5] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
>>>   [6] org.Mm.eg.db_3.4.0
>>>   [7] Homo.sapiens_1.3.1
>>>   [8] GO.db_3.4.0
>>>   [9] OrganismDbi_1.17.1
>>>  [10] org.Hs.eg.db_3.4.0
>>>  [11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
>>>  [12] org.Dm.eg.db_3.4.0
>>>  [13] TxDb.Dmelanogaster.UCSC.dm6.ensGene_3.3.0
>>>  [14] GenomicFeatures_1.27.2
>>>  [15] AnnotationDbi_1.37.0
>>>  [16] Biobase_2.35.0
>>>  [17] GenomicRanges_1.27.6
>>>  [18] GenomeInfoDb_1.11.4
>>>  [19] IRanges_2.9.8
>>>  [20] S4Vectors_0.13.2
>>>  [21] BiocGenerics_0.21.0
>>>  [22] BiocInstaller_1.25.2
>>>
>>>  loaded via a namespace (and not attached):
>>>   [1] compiler_3.4.0             XVector_0.15.0
>>>   [3] bitops_1.0-6               tools_3.4.0
>>>   [5] zlibbioc_1.21.0            biomaRt_2.31.1
>>>   [7] RSQLite_1.0.0              lattice_0.20-34
>>>   [9] Matrix_1.2-7.1             graph_1.53.0
>>>  [11] DBI_0.5-1                  rtracklayer_1.35.1
>>>  [13] Biostrings_2.43.0          grid_3.4.0
>>>  [15] XML_3.98-1.5               RBGL_1.51.0
>>>  [17] BiocParallel_1.9.1         Rsamtools_1.27.2
>>>  [19] GenomicAlignments_1.11.1   SummarizedExperiment_1.5.3
>>>  [21] RCurl_1.95-4.8
>>>  >
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>>
>> This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the
>> employee or agent responsible for the delivery of this message to the
>> intended recipient(s), you are hereby notified that any disclosure,
>> copying, distribution, or use of this email message is prohibited.  If
>> you have received this message in error, please notify the sender
>> immediately by e-mail and delete this email message from your
>> computer. Thank you.
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list