[Bioc-devel] Opinions on a meta-annotation package

Panagiotis Moulos mou|o@ @end|ng |rom ||em|ng@gr
Fri Jan 8 09:21:58 CET 2021


Dear Kasper and Hervé,

Thank you for your feedback and apologies for reviving such an old thread.
Hervé, you are right, apologies for the confusion in the message title and
content, it is a software package for creating/retrieving specific
annotation elements and not an annotation package.

After one year and several usages of the scheme I was proposing (including
a couple of publications), I think I will go for a new package.
Although there is definitely a significant overlap with current
Bioconductor mechanisms for creating annotation packages, I think this new
package will serve the following purposes:

- Serious code repetition will be avoided and the other packages will
become more manageable and sustainable (metaseqR2 has already a ton of
dependencies). Also, there are other packages in development using this
scheme.
- Simple genomic annotation elements (such as simple gene co-ordinates,
required for simple read counting) from many organisms (including custom)
will become available in one package
- Versioning of these elements will be tracked in one place
- Many times, I have come across users who prefer (although not always
correct or best approach) this simple "tab-delimited" view of genomic
regions of interest
- It will offer RefSeq and Ensembl transcript versioning, something which
(maybe because of my limited knowledge) I have not been able to find how to
do with current Bioconductor facilities. I believe this will be useful for
many people working on precision medicine and diagnostic projects.

Best regards,

Panagiotis

On Thu, 24 Oct 2019 at 23:11, Kasper Daniel Hansen <
kasperdanielhansen using gmail.com> wrote:

> From your description it very much sounds like creating a new package is
> the way to go.
>
> On Thu, Oct 24, 2019 at 3:03 PM Pages, Herve <hpages using fredhutch.org> wrote:
>
>> Hi Panagiotis,
>>
>> Avoiding code repetition is always a good idea. An alternative to the
>> creation of a 3rd package would be to have one of the 2 packages depend
>> on the other. If that is not a good option (and there might be some
>> valid reasons for that) then yes, factorizing out the repeated stuff and
>> putting it in a 3rd package is a good option.
>>
>> Note that your subject line is confusing: You're asking opinions on a
>> meta-annotation package but IIUC this is about the creation of a
>> **software** package that would provide tools for building and/or
>> querying a certain type of annotations right? I think of a
>> meta-annotation package as a data package that would contain searchable
>> meta data about existing biological annotations but that is not what we
>> are talking about here is it?
>>
>> Also I wonder how much overlap there would be between this new package
>> and packages like AnnotationDbi, AnnotationForge, GenomicFeatures,
>> ensembldb which also provide functionalities for creating and querying
>> annotations. For example AnnotationForge and AnnotationDbi are used to
>> create and query the hundreds of "classic" *db packages.
>>
>> Best,
>> H.
>>
>> On 10/20/19 19:56, Panagiotis Moulos wrote:
>> > Dear developers,
>> >
>> > I maintain two packages (metaseqR, recoup) and about to submit an
>> enhanced
>> > (but different in many points, thus a new package) version of the 1st
>> > (metaseqR2). During their course of development, maintenance and usage,
>> > these packages have somehow come to use a common underlying annotation
>> > system for the genomic regions they operate on, which of course makes
>> use
>> > of Bioconductor facilities and of course structures (GenomicRanges,
>> > GenomicAlignments, BSgenome, GenomicFeatures etc.)
>> >
>> > This annotation system:
>> > - Builds a local SQLite database
>> > - Supports certain "custom" genomic features which are required for the
>> > modeling made by these packages
>> > - Is currently embedded to each package
>> > - Has almost evolved to a package of its own with respect to independent
>> > functionalities
>> >
>> > The reason for this mail/question is that I would like to ask your
>> opinion
>> > whether it is worthy to create a new package to host  the annotation
>> > functions and detach from the other two. Some points to support this
>> idea:
>> >
>> > 1. It's used in the same manner by two other packages, thus there is a
>> lot
>> > of code  repetition
>> > 2. Users (including myself) often load one of these packages just to
>> use it
>> > to fetch genomic region annotations for other purposes outside the
>> scope of
>> > each package (metaseqR - RNA-Seq data analysis, recoup - NGS signal
>> > visualization).
>> > 3. It automatically constructs the required annotation regions to
>> analyze
>> > Lexogen Quant-Seq data (a protocol we are using a lot), a function which
>> > may be useful to many others
>> > 4. The database created can be expanded with custom user annotations
>> using
>> > a GTF file to create it (making use of makeTxDbFromGFF)
>> > 5. Supports various annotation sources (Ensembl, UCSC, RefSeq, custom)
>> in
>> > one place
>> > 6. Has a versioning system, allowing transparency and reproducibility
>> when
>> > required
>> >
>> > Some (maybe obvious) points against this idea:
>> >
>> > 1. Bioconductor has already a robust and rich genomic annotation system
>> > which can be used and re-used as necessary
>> > 2. Maybe there is no need for yet another annotation-related package
>> > 3. There is possibly no wide acceptance for such a package, other than
>> my
>> > usage in the other two, and maybe a few more users that make use of the
>> > annotation functionalities
>> > 4. Does not follow standard Bioconductor guidelines for creating
>> annotation
>> > packages (on the other hand it's not an annotation package in the strict
>> > sense, but more a meta-annotation package).
>> >
>> > Do you have any thoughts or opinions on the best way of action?
>> >
>> > Best regards,
>> >
>> > Panagiotis
>> >
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages using fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> --
> Best,
> Kasper
>


-- 
*Panagiotis Moulos, PhD, Bioinformatician*
*Associate Staff Scientist,*
*BSRC 'Alexander Fleming'*
*Fleming 34, 16672, Vari, Greece*
*Tel: +30 210 9656310, int. 131*
*Skype: panos_mou*

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list