[Bioc-devel] Pipeline management (and AnnotationData Packages version)

Elena Grassi grassi.e at gmail.com
Mon Jun 20 10:16:43 CEST 2016


TL;DR: what do you use to manage workflows and pipelines for you research? Knitr
or similar? Galaxy? Taverna?

Long version:

in my lab we are thinking about restyling our pipeline management
system and would like to exploit as much as possible the wealth of
annotations available in the AnnotationData Packages.

Our system is based on a set of available scripts for common
operations and makefiles
and aims at being as most language agnostic as possible - right now we
have a bunch of 'annotation dirs' organized with versions and/or
download dates in pathnames which stores genomes, gene coords and so
on in flat tab delimited files.

Ideally we would like to switch to annotation packages without
generating with them those files but by performing the task at hand
directly in the working projects directories (i.e. getting the
sequences corresponding to a given set of coords or converting some
gene ids, etc) with a set of ad hoc R scripts.
In this scenario the problem of version management arises: it's not
only necessary to write somewhere the used version of R/Bioc/packages
but also ideally to be able (possibly in a user transparent way, i.e.
setting a makefile variable) to re-run the same steps (that are
makefile rules) with the same original versions.

I was wondering if someone else organizes pipelines in a manner
similar to ours and has faced the same issues. I am thinking about
linking our makefile management system with docker images with the
right package versions but I am a bit worried about the disk
requirements of such an approach.

I did some googling but haven't found anything really fitting our
scenario - I hope that I did not miss anything obvious. I saw this
enabling reproducible research & R package management &
install.package.version & BiocLiteold post but I believe that it is a
bit outdated.

Thank you very much for your help, I am in general curious about
pipeline management inside the Bioconductor community, maybe everyone
is using knitr and that's it :) - I use it for some solo project but
still haven't tried it out for big ones and for the kind of
undergraduate student that we work with starting with that would be
troublesome unfortunately.


$ pom

More information about the Bioc-devel mailing list