[BioC] Chip-seq quality control

Tue Oct 4 20:57:52 CEST 2011

On 10/04/2011 07:33 AM, Ivan Gregoretti wrote:
 > Hello Lucia,
 >
 > A proper response to your post would take a lecture rather than an
 > email. I can't do that but I can bullet the main points. I think that
 > it will help you if you are indeed a newcomer to ChIP-seq.
 >
 > 1) Expect 10 million reads per sample for a genome the size of human.

I'd run some basic QA on your lanes, via ShortRead::qa on the fastq 
files (or bam if fastq are not available); use FastqSampler if memory is 
tight (but in general if memory is tight the solution will be to find a 
larger computer).

See http://bioconductor.org/help/workflows/high-throughput-sequencing/ 
for qa and perhaps other operations common to RNAseq / ChIPseq work flows

 >
 > 2) Stick to SAM/BAM formats so that you can use well known, publicly
 > available tools. Your best friend is called Picard.

People can and do use R / Bioconductor for Picard-like tasks.

 > 3) Remove duplicates. Again, Picard is your best friend.

 > 4) Create WIG files for all samples, treatments and controls so that
 > you can display them simultaneously on any genome browser.

here for interactive use I would rather use basic R plotting commands, 
avoiding the round-trip and allowing programmatic interaction.

 > 5) Find peaks with a well documented peak finder.

probably a good suggestion for a one-off or common ChIP; the chipseq 
vignette

   http://bioconductor.org/packages/release/bioc/html/chipseq.html

provides inspiration for more flexible analysis; packages under the 
ChIPseq biocViews term (Software --> AssayTechnologies -> 
HighThroughputSequencing->ChIPSeq) might offer a solution tailored to 
your ChIP.

 > 6) Compute enrichment for all treatments relative to their controls.

again the chipseq vignette is an alternative source.

 >
 > So, points 4 and 6 are your quality controls at this stage. Once you
 > know what a good immunoprecipitation looks like compared to a bad one,
 > you can start diving into the details. You can invent your own quality

especially at getting a sense for good versus bad results the 
interactivity of R / Bioconductor seem essential.

Martin

 > indicators. For instance, I compute the proportion of tags inside the
 > 1000 strongest peaks. I do that for BOTH treatment and controls.
 >
 > In my workflow, Bioconductor does not get involved until I reach point 6.
 >
 > Happy ChIPing.
 >
 > Ivan
 >
 >
 >
 >
 >
 > On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto<luciap at iscb.org>  wrote:
 >> Hi,
 >> I am new to Chip-seq, my experiment's sequencing has finished, and 
the read
 >> alignment is currently running
 >> The experiment  was done for histone acetylation, and I have two 
types of
 >> controls: input DNA and unmodified histone.
 >> I have two conditions and 6 biological replicates of each condition
 >> I wanted some advice on how to perform basic quality control on Chip-seq
 >> data using Bioconductor
 >> and also some ideas of which kinds of biases people usually observe 
and I
 >> should keep my eyes open for
 >> any advice will be greatly appreciated!
 >> thanks
 >>
 >> Lucia
 >>
 >>         [[alternative HTML version deleted]]
 >>
 >> _______________________________________________
 >> Bioconductor mailing list
 >> Bioconductor at r-project.org
 >> https://stat.ethz.ch/mailman/listinfo/bioconductor
 >> Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor
 >>
 >
 > _______________________________________________
 > Bioconductor mailing list
 > Bioconductor at r-project.org
 > https://stat.ethz.ch/mailman/listinfo/bioconductor
 > Search the archives: 
http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793