[Bioc-devel] Feedback on OrganismDb development

Obenchain, Valerie Valerie.Obenchain at roswellpark.org
Fri Apr 8 23:48:17 CEST 2016


On 04/08/2016 02:34 PM, Obenchain, Valerie wrote:
> Hi,
>
> It sounds like we have agreement on these points:
>
> - Add support for sequences
> - Keep the OrganismDb name
> - Do provide pre-built packages
>
> I'm not sure we got much weigh in on the package name. I tend towards a
> more descriptive name. This post (to me) is an example of the confusion
> that can come from a very general name:
>
> https://support.bioconductor.org/p/80612/

By 'use some development' I mean we have them on the TODO to work on
after the release.

Valerie
>
>
> These areas could use some development:
>
> - More support for Ensembl-based tracks:
> I don't think we're moving to ENSEMBL as a default (sorry Tim!) but we
> can provide more Ensembl-centric TxDbs.
>
> - Role of AnnotationHub:
> We have thought of storing the sqlite dbs in AnnotationHub and creating
> an OrganismDb on the fly. There is the issue of working off-line that
> Tim brought mentioned. We plan to do some testing with this after the
> release.
>
> - Better advertise the useful operations one can do with OrganismDb such
> as seqinfo, seqlengths, symbol translations etc.
>
>
> Tim and Ludwig, you asked for a few specific organisms. In devel we now
> have TxDbs for all mentioned except ecoli and arabidopsis so you can
> make your own OrganismDb with makeOrganismPackage().  I'm planning to
> send out a summary email to bioc-devel out that highlights the new Txdbs
> and other changes to the annotation in 3.3.
>
> Thanks for the feedback.
> Valerie
>
>
> On 04/07/2016 08:38 AM, Tim Triche, Jr. wrote:
>> My only concern regarding AnnotationHub is offline use. I find that
>> I'm more productive if I turn off the network interface altogether...
>> Maybe I'm the only one. BiomaRt also scarred me a little. 
>>
>> --t
>>
>> On Apr 7, 2016, at 8:34 AM, Vincent Carey <stvjc at channing.harvard.edu
>> <mailto:stvjc at channing.harvard.edu>> wrote:
>>
>>>
>>> On Thu, Apr 7, 2016 at 11:24 AM, Tim Triche, Jr.
>>> <tim.triche at gmail.com <mailto:tim.triche at gmail.com>> wrote:
>>>
>>>     Great!  This is an awesome opportunity to move to ENSEMBL as a
>>>     default ;-) (only half kidding, by the way)
>>>
>>>     1) BSGenome/2bit would be great -- I use this sometimes to
>>>     generate fusion transcripts with defined breakpoints to
>>>     supplement existing txomes
>>>
>>>     2) class name: don't change it
>>>
>>>     3) pre made packages: god yes. Try creating an ENSEMBL TxDb from
>>>     a GTF on a laptop sometime!  I am planning to try and help a bit
>>>     in this respect with direct Reactome mappings of various ID types
>>>     for downstream analysis so this is not just a feature request, I
>>>     will help with it.
>>>
>>>
>>> I think this is a different concern.  The TxDb infrastructure seems
>>> sound, and EnsDb is useful too... but they are not well-harmonized;
>>> TxDb works nicely with Gviz.  I do think that we should have a richer
>>> collection of interoperable transcript model sources packaged.  But
>>> the OrganismDb discipline, if I understand it correctly, only
>>> specifies links among diverse annotation packages and helps with
>>> cross-package joins.
>>>
>>> An afterthought -- maybe the TxDb annotation package discipline will
>>> give way to queries to AnnotationHub.  OrganismDb instance
>>> construction would specify AnnotationHub entities that comprise the
>>> instance and, on use, retrieve form AnnotationHub whatever is not in
>>> the cache.
>>>  
>>>
>>>
>>>     Thanks for picking this up. I and others use the organismdbi
>>>     packages all the time and was wondering what would become of them
>>>     now that Marc moved to Seattle Children's. It is great to hear
>>>     that they will receive renewed attention because it is a really
>>>     handy infrastructure. About all I could ask for is Drosophila,
>>>     Danio, and Caenorhabditis organism packages ;-)
>>>
>>>     Thank you,
>>>
>>>     --t
>>>
>>>     > On Apr 7, 2016, at 7:34 AM, Obenchain, Valerie
>>>     <Valerie.Obenchain at roswellpark.org
>>>     <mailto:Valerie.Obenchain at roswellpark.org>> wrote:
>>>     >
>>>     > BioC developers,
>>>     >
>>>     > After the release we plan to continue development the
>>>     OrganismDb class
>>>     > and packages. This email outlines some ideas for future
>>>     direction. We're
>>>     > interested in feedback on these points as well as other
>>>     thoughts people
>>>     > might have.
>>>     >
>>>     > ## Background
>>>     >
>>>     > The OrganismDb class is defined in the OrganismDbi package and
>>>     consists
>>>     > of a TxDb object and the combined mappings from GO.db and an
>>>     OrgDb. It
>>>     > supports the select() interface as well as several range-based
>>>     > extractors such as exons(), transcripts(), etc. The idea was
>>>     that given
>>>     > a particular organism, a user would only need a single package
>>>     to access
>>>     > both system biology and transcripts-centric annotations.
>>>     >
>>>     > We currently have 3 OrganismDb packages
>>>     >
>>>     (http://www.bioconductor.org/packages/release/BiocViews.html#___OrganismDb).
>>>     > These are light weight and don't contain any data themselves
>>>     but instead
>>>     > point to the GO.db, OrgDb and TxDb packages.
>>>     >
>>>     > ## Current issues
>>>     >
>>>     > - Support for sequence representation
>>>     >
>>>     > We've discussed incorporating an optional sequence component, maybe
>>>     > BSgenome or 2bit or ... ?
>>>     >
>>>     >
>>>     > - Class name
>>>     >
>>>     > OrganismDb is similar to OrgDb which could cause some
>>>     confusion. We are
>>>     > considering renaming ... here are a few ideas. Let us know what you
>>>     > think or add your suggestion.
>>>     >
>>>     > OrganismDb (fine as is, leave it)
>>>     > FullOrgDb
>>>     > CrossDb
>>>     > MultipleDb
>>>     >
>>>     >
>>>     > - Package name
>>>     >
>>>     > The current names are not very descriptive: Homo.sapiens,
>>>     Mus.musculus
>>>     > and Rattus.norvegicus.  We'd like to follow the naming
>>>     convention used
>>>     > in our BSgenome and TxDb packages which means including the source,
>>>     > build and track from the TxDb as well as preceding with the
>>>     class type.
>>>     >
>>>     > For example, the current 'Homo.sapiens' package would be renamed
>>>     > 'OrganismDb.Hsapiens.UCSC.hg19.knownGene'.
>>>     >
>>>     >
>>>     > - Pre-made packages
>>>     >
>>>     > Is it useful to supply pre-made packages or just increase
>>>     awareness of
>>>     > the helpers so users can make their own? Current helpers:
>>>     >
>>>     >> ?makeOrganism
>>>     > ?makeOrganismDbFromBiomart  ?makeOrganismDbFromTxDb
>>>     > ?makeOrganismDbFromUCSC     ?makeOrganismPackage
>>>     >
>>>     > NOTE: makeOrgansimPackage() will be renamed to
>>>     makeOrganismDbPackage().
>>>     >
>>>     >
>>>     > Thanks.
>>>     > Valerie
>>>     >
>>>     >
>>>     > This email message may contain legally privileged and/or
>>>     confidential information.  If you are not the intended
>>>     recipient(s), or the employee or agent responsible for the
>>>     delivery of this message to the intended recipient(s), you are
>>>     hereby notified that any disclosure, copying, distribution, or
>>>     use of this email message is prohibited.  If you have received
>>>     this message in error, please notify the sender immediately by
>>>     e-mail and delete this email message from your computer. Thank you.
>>>     > _______________________________________________
>>>     > Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>     mailing list
>>>     > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>     <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>
>>>     _______________________________________________
>>>     Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>     mailing list
>>>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>
>
> This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.


More information about the Bioc-devel mailing list