[Bioc-sig-seq] chipseq infrastructure

Raphael Gottardo raphaelgottardo at mac.com
Tue Mar 2 01:46:51 CET 2010


Hi Michael (and others),

I would certainly second that. You guys have develop great tools for low level analysis of next gen data, but higher level analysis are still lagging behind. Though, this is rather normal as the higher level stuff needs the lower level infrastructure.

My group has been working on several aspects of chip-seq analysis and to some extend gene regulation. 
As noted in one of the email this morning, we are about to submit our PICS software based on a version of this paper http://arxiv.org/abs/0903.3206, which we hope will be published in Biometrics in the near future. For our package we have used some of the infrastructure available in the chip-seq package, and IRanges. 

One the problem we have faced is data input. In chipseq, one does not need sequence reads. However, when you use ShortReads you automatically get the sequence reads which takes a lot of memory. For some highly sequenced data we have, it has been somewhat of a bottleneck.
So it would be nice to be able to only read the chr/start/strand information. As pointed out by Wolfgang, rsamtools might be the solution, so we will have to see how we can use rsamtools and the classes defined there for chip-seq. This being said we still have a lot of files from non MAQ aligners.
I think Arnaud Droit, who is in my group, has sent an email about this issue already.

Besides PICS that will be submitted this week, we have already released a package for motif analyses, rGADEM, which can work on standard Biostrings objects. rGADEM is relatively fast and well adapted for ChIP-seq enriched regions. We also have another package, MotIV for motif validation and identification based which is based on STAMP (with many improved functionalities). MotIV is under review I believe and should be available soon.

Anyway, so very soon we will have a complete pipeline from shortread -> enriched regions (PICS) -> motifs (rGADEM) -> validated motifs and motif occurrences (MotIV) -> other BioC packages (e.g. GenomicsFeatures, etc).

So at least this will be a start. Of course we are open to suggestions/requests, etc. If any of you guys want more details feel free to drop us an email.

Cheers,

Raphael

On 2010-03-01, at 10:08 AM, Michael Lawrence wrote:

> Hey guys,
> 
> I'm wondering if anyone has given any thought to some sort of generic
> framework for chipseq analysis in Bioconductor, based on the IRanges,
> Biostrings, etc infrastructure. chipseq has some nice utilities; could it be
> transformed into some sort of generic chipseq pipeline? Something like how
> the 'affy' package (I think?) allows other packages to provide alternative
> implementations for particular stages. Just having a clean, refined,
> approximately complete set of chipseq-focused utilities would be nice.
> Presumably chipseq could fill that role? I think we now have a good idea of
> the basic steps in chipseq analysis, so it's probably time for such a
> package to emerge.
> 
> Comments?
> 
> Michael
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list