[Bioc-sig-seq] Reducing Solexa's export.txt in preparation for aChIP-seq analysis.

Martin Morgan mtmorgan at fhcrc.org
Thu Mar 19 17:34:42 CET 2009


Hi Ivan --

ig2ar-saf2 at yahoo.co.uk writes:

> Hi João, Robert and everyone,
>
> First, thank you for your responses.
>
> It was easy to predict that the question about discarding duplicates would get responses faster as it is the one that you can address quickly and accurately with common sense.
>
> Briefly, I fully agree, we need to remove the PCR bias whenever possible. I was actually wondering about the false negative error introduced by this filter. Now that I think about it a little better, the benefit of the the decrease in false positives must comfortably outweigh the increase of false negative unless the ChIP peaks are very sharp. Good.
>
> Question one still remains:
>

> I read in the data with ShortRead. Now, how do I filter it and
> export it to a fomat that will allow me to follow along the example
> workflow? Class matters, doesn't it?

To 'chip' away at your question a bit, ShortRead has ?srFilter which
provides some builtin functionality; in general you can extract the
reads / other components and select reads as you see fit, e.g.,

  alf <- alphabetFreqeuncy(sread(aln), baseOnly=TRUE)
  idx <- rowSums(alf[,c("G","C")]) / rowSums(alf) > .6
  gcrich <- aln[idx]

(no idea why this would be a good idea ;).

My next step would use coverage as described on the
?"AlignedRead-class" page. This really brings us to a point where we
can do analyses on p. 2, e.g., slice, and much of the remainder of the
vignette you reference can be worked through without real problem.

chipseq is in the devel branch, and data structures there have not
been fully aligned with other established structures, and vice versa,
and the chipseq data structures are not fully described.  This is a
disconnect that we're working to resolve before the release.

Others might have something else to say.
     
Martin

>
>> load("alignedLocs.rda")
>> ls()
> [1] "alignedLocs"
>> class(alignedLocs)
> [1] "AlignedList"
> attr(,"package")
> [1] "chipseq"
>
> Thank you,
>
> Ivan
>
>
>
>
>
>
> ----- Original Message ----
> From: João Fadista <Joao.Fadista at agrsci.dk>
> To: ig2ar-saf2 at yahoo.co.uk; bioc-sig-sequencing at r-project.org
> Sent: Thursday, 19 March, 2009 10:31:24
> Subject: RE: [Bioc-sig-seq] Reducing Solexa's export.txt in preparation for aChIP-seq analysis.
>
>
> Hi,
>
> Removing duplicates is a step that you can do in order to minimize the possible bias due to the amplification in sample preparation. 
>
> Best,
> João
>
> -----Original Message-----
> From: bioc-sig-sequencing-bounces at r-project.org [mailto:bioc-sig-sequencing-bounces at r-project.org] On Behalf Of ig2ar-saf2 at yahoo.co.uk
> Sent: Thursday, March 19, 2009 3:24 PM
> To: bioc-sig-sequencing at r-project.org
> Subject: [Bioc-sig-seq] Reducing Solexa's export.txt in preparation for aChIP-seq analysis.
>
>
> Hello,
>
> In preparation to analyse my own ChIP-seq data, I am trying to follow the steps described in this sample workflow:
>
> http://www.bioconductor.org/workshops/2008/SeattleNov08/ChIP-seq/workflow.pdf
>
> The document starts by loading data that has been "reduced to a set of alignment start positions (including orientation)".
>
> Can somebody elaborate on that a little bit or, ideally, show it with one example?
>
> Also, as part of the reduction, the procedure "removed all duplicate reads and applied a quality score cutoff". The score cutoff is fine but how is removing duplicates justified?
>
> Thank you,
>
> Ivan
>
>
>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
>
>
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list