[Bioc-sig-seq] chipseq infrastructure
Deepayan Sarkar
deepayan.sarkar at gmail.com
Tue Mar 2 05:57:01 CET 2010
On Mon, Mar 1, 2010 at 7:08 AM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Hey guys,
>
> I'm wondering if anyone has given any thought to some sort of generic
> framework for chipseq analysis in Bioconductor, based on the IRanges,
> Biostrings, etc infrastructure. chipseq has some nice utilities; could it be
> transformed into some sort of generic chipseq pipeline? Something like how
> the 'affy' package (I think?) allows other packages to provide alternative
> implementations for particular stages. Just having a clean, refined,
> approximately complete set of chipseq-focused utilities would be nice.
> Presumably chipseq could fill that role? I think we now have a good idea of
> the basic steps in chipseq analysis, so it's probably time for such a
> package to emerge.
>
> Comments?
Good idea of course, but will need thought. We should probably start
with identifying typical stages of the analysis, and formulating
suitable data structures. What we have now is:
- Data I/O and QA: External software + ShortRead
- Data reduction: Is "GenomeDataList" good, or do we want something
else as an intermediate on-disk storage format?
- Modeling + Peak Calling: Is coverage the right abstraction? We have
one method based on coverage, but not all methods are.
I'm also not sure how much of this can be put into a framework. For
example, it's not clear how genomic annotation can be incorporated.
One can call peaks and then "intersect" with promoter regions, or
bypass peak-calling and start directly with promoter regions.
In the chipseq package, we basically gave up trying to formalize
this, and made it free-for-all after the data reduction step. I'm not
sure we can do better unless we restrict to specific pipelines.
-Deepayan
More information about the Bioc-sig-sequencing
mailing list