[BioC] Chip-seq quality control
Martin Morgan
mtmorgan at fhcrc.org
Tue Oct 4 20:57:52 CEST 2011
On 10/04/2011 07:33 AM, Ivan Gregoretti wrote:
> Hello Lucia,
>
> A proper response to your post would take a lecture rather than an
> email. I can't do that but I can bullet the main points. I think that
> it will help you if you are indeed a newcomer to ChIP-seq.
>
> 1) Expect 10 million reads per sample for a genome the size of human.
I'd run some basic QA on your lanes, via ShortRead::qa on the fastq
files (or bam if fastq are not available); use FastqSampler if memory is
tight (but in general if memory is tight the solution will be to find a
larger computer).
See http://bioconductor.org/help/workflows/high-throughput-sequencing/
for qa and perhaps other operations common to RNAseq / ChIPseq work flows
>
> 2) Stick to SAM/BAM formats so that you can use well known, publicly
> available tools. Your best friend is called Picard.
People can and do use R / Bioconductor for Picard-like tasks.
> 3) Remove duplicates. Again, Picard is your best friend.
> 4) Create WIG files for all samples, treatments and controls so that
> you can display them simultaneously on any genome browser.
here for interactive use I would rather use basic R plotting commands,
avoiding the round-trip and allowing programmatic interaction.
> 5) Find peaks with a well documented peak finder.
probably a good suggestion for a one-off or common ChIP; the chipseq
vignette
http://bioconductor.org/packages/release/bioc/html/chipseq.html
provides inspiration for more flexible analysis; packages under the
ChIPseq biocViews term (Software --> AssayTechnologies ->
HighThroughputSequencing->ChIPSeq) might offer a solution tailored to
your ChIP.
> 6) Compute enrichment for all treatments relative to their controls.
again the chipseq vignette is an alternative source.
>
> So, points 4 and 6 are your quality controls at this stage. Once you
> know what a good immunoprecipitation looks like compared to a bad one,
> you can start diving into the details. You can invent your own quality
especially at getting a sense for good versus bad results the
interactivity of R / Bioconductor seem essential.
Martin
> indicators. For instance, I compute the proportion of tags inside the
> 1000 strongest peaks. I do that for BOTH treatment and controls.
>
> In my workflow, Bioconductor does not get involved until I reach point 6.
>
> Happy ChIPing.
>
> Ivan
>
>
>
>
>
> On Mon, Oct 3, 2011 at 5:17 PM, Lucia Peixoto<luciap at iscb.org> wrote:
>> Hi,
>> I am new to Chip-seq, my experiment's sequencing has finished, and
the read
>> alignment is currently running
>> The experiment was done for histone acetylation, and I have two
types of
>> controls: input DNA and unmodified histone.
>> I have two conditions and 6 biological replicates of each condition
>> I wanted some advice on how to perform basic quality control on Chip-seq
>> data using Bioconductor
>> and also some ideas of which kinds of biases people usually observe
and I
>> should keep my eyes open for
>> any advice will be greatly appreciated!
>> thanks
>>
>> Lucia
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list