[Bioc-devel] Opinions on a meta-annotation package

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Thu Oct 24 21:03:33 CEST 2019


Hi Panagiotis,

Avoiding code repetition is always a good idea. An alternative to the 
creation of a 3rd package would be to have one of the 2 packages depend 
on the other. If that is not a good option (and there might be some 
valid reasons for that) then yes, factorizing out the repeated stuff and 
putting it in a 3rd package is a good option.

Note that your subject line is confusing: You're asking opinions on a 
meta-annotation package but IIUC this is about the creation of a 
**software** package that would provide tools for building and/or 
querying a certain type of annotations right? I think of a 
meta-annotation package as a data package that would contain searchable 
meta data about existing biological annotations but that is not what we 
are talking about here is it?

Also I wonder how much overlap there would be between this new package 
and packages like AnnotationDbi, AnnotationForge, GenomicFeatures, 
ensembldb which also provide functionalities for creating and querying 
annotations. For example AnnotationForge and AnnotationDbi are used to 
create and query the hundreds of "classic" *db packages.

Best,
H.

On 10/20/19 19:56, Panagiotis Moulos wrote:
> Dear developers,
> 
> I maintain two packages (metaseqR, recoup) and about to submit an enhanced
> (but different in many points, thus a new package) version of the 1st
> (metaseqR2). During their course of development, maintenance and usage,
> these packages have somehow come to use a common underlying annotation
> system for the genomic regions they operate on, which of course makes use
> of Bioconductor facilities and of course structures (GenomicRanges,
> GenomicAlignments, BSgenome, GenomicFeatures etc.)
> 
> This annotation system:
> - Builds a local SQLite database
> - Supports certain "custom" genomic features which are required for the
> modeling made by these packages
> - Is currently embedded to each package
> - Has almost evolved to a package of its own with respect to independent
> functionalities
> 
> The reason for this mail/question is that I would like to ask your opinion
> whether it is worthy to create a new package to host  the annotation
> functions and detach from the other two. Some points to support this idea:
> 
> 1. It's used in the same manner by two other packages, thus there is a lot
> of code  repetition
> 2. Users (including myself) often load one of these packages just to use it
> to fetch genomic region annotations for other purposes outside the scope of
> each package (metaseqR - RNA-Seq data analysis, recoup - NGS signal
> visualization).
> 3. It automatically constructs the required annotation regions to analyze
> Lexogen Quant-Seq data (a protocol we are using a lot), a function which
> may be useful to many others
> 4. The database created can be expanded with custom user annotations using
> a GTF file to create it (making use of makeTxDbFromGFF)
> 5. Supports various annotation sources (Ensembl, UCSC, RefSeq, custom) in
> one place
> 6. Has a versioning system, allowing transparency and reproducibility when
> required
> 
> Some (maybe obvious) points against this idea:
> 
> 1. Bioconductor has already a robust and rich genomic annotation system
> which can be used and re-used as necessary
> 2. Maybe there is no need for yet another annotation-related package
> 3. There is possibly no wide acceptance for such a package, other than my
> usage in the other two, and maybe a few more users that make use of the
> annotation functionalities
> 4. Does not follow standard Bioconductor guidelines for creating annotation
> packages (on the other hand it's not an annotation package in the strict
> sense, but more a meta-annotation package).
> 
> Do you have any thoughts or opinions on the best way of action?
> 
> Best regards,
> 
> Panagiotis
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319


More information about the Bioc-devel mailing list