[Bioc-sig-seq] chipseq infrastructure

Tue Mar 2 05:57:01 CET 2010

On Mon, Mar 1, 2010 at 7:08 AM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Hey guys,
>
> I'm wondering if anyone has given any thought to some sort of generic
> framework for chipseq analysis in Bioconductor, based on the IRanges,
> Biostrings, etc infrastructure. chipseq has some nice utilities; could it be
> transformed into some sort of generic chipseq pipeline? Something like how
> the 'affy' package (I think?) allows other packages to provide alternative
> implementations for particular stages. Just having a clean, refined,
> approximately complete set of chipseq-focused utilities would be nice.
> Presumably chipseq could fill that role? I think we now have a good idea of
> the basic steps in chipseq analysis, so it's probably time for such a
> package to emerge.
>
> Comments?

Good idea of course, but will need thought. We should probably start
with identifying typical stages of the analysis, and formulating
suitable data structures. What we have now is:

 - Data I/O and QA: External software + ShortRead

 - Data reduction: Is "GenomeDataList" good, or do we want something
else as an intermediate on-disk storage format?

 - Modeling + Peak Calling: Is coverage the right abstraction? We have
one method based on coverage, but not all methods are.

   I'm also not sure how much of this can be put into a framework. For
example, it's not clear how genomic annotation can be incorporated.
One can call peaks and then "intersect" with promoter regions, or
bypass peak-calling and start directly with promoter regions.

   In the chipseq package, we basically gave up trying to formalize
this, and made it free-for-all after the data reduction step. I'm not
sure we can do better unless we restrict to specific pipelines.

-Deepayan