[Bioc-sig-seq] Adapter removal

Thu Jul 17 16:35:20 CEST 2008

On Thu, Jul 17, 2008 at 9:47 AM, Krys Kelly <kak28 at cam.ac.uk> wrote:
> I have inherited a pipeline for Solexa sequence data using Perl, Bioperl,
> SSAHA and mySQL.  As an R/Bioconducter user I am interested in ShortRead and
> BiostringsCinterfaceDemo.
>
> However, in the short term I need to use the current pipeline.  The imaging
> is done by the Sequencing Facility and we get fastq files with the 3'
> adapter still attached. The adapter removal is currently done by a Perl
> script which just keeps sequences which match any number of letters in
> [ACGT] followed by the first 8 letters of the adapter.  This seems pretty
> crude (e.g. only using 8 letters, not allowing for mismatches, not allowing
> for the diminishing quality along the length of the read).
>
> Google has not revealed any algorithms or code for this part of the
> pipeline.  Does anyone know what algorithms are being used or, even better,
> could anyone point me in the direction of some code?

I believe that MAQ will do this for you.  You can then use the
ShortRead package to read the MAQ output (VERY, VERY fast).

Sean