[Bioc-sig-seq] Trimming Solexa/Illumina primers/adaptors; ShortRead

Martin Morgan mtmorgan at fhcrc.org
Fri Feb 5 18:07:14 CET 2010


Hi Jo --

On 02/04/2010 10:20 PM, Johannes Rainer wrote:
> dear all,
> 
> this is probably a very simple question but i'm quite new to the high
> throughput sequencing...
> I've got reads from an ChIPseq experiment and the service provider sent me
> finally also the list of adaptors and primers for which i want to screen and
> clip the reads (using the trimLRPatterns function from the ShortRead package
> ).
> 
> my question now is if I have to reverse complement the sequences also, and
> if I should use e.g. the DNA_AD1 as the Lpattern parameter and the DNA_AD2
> as the Rpattern, or how does this work now? below I pasted some of the
> adaptor/primer sequences.

Not a direct answer to your question, but ... (a) Solexa reads are
reported as read from the flow cell, 5' to 3'; I think this means that
no reverse complement is necessary. (b) I think it's relatively unusual
to trim or filter ChIP-seq Solexa reads -- at least in shorter reads
trimming would mean that there wasn't enough sequence for alignment, and
not trimming would mean that the sequence is too dissimilar for
alignment in the first place. Either way you end up with adapter
sequences not aligning, and hence being discarded from any downstream
analysis.

Trimming adapters might become important in other scenarios, e.g., miRNA
where the target is shorter than the read, or working with longer reads
where sample prep introduces artifacts to be removed (e.g., primers or
bar codes in metagenomic analysis of 454 reads).

Martin

> sorry for my newbie-question,
> 
> bests, jo
> 
> 
>> DNA_AD1
> GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
>> DNA_AD2
> ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>> DNA_PCR1
> AATGATACGGCGACCACCGACACTCTTTCCCTACACGACGCTCTTCCGATCT
>> DNA_PCR2
> CAAGCAGAAGACGGCATACGACGCTCTTCCGATCT
>> DNA_SEQ
> CACTCTTTCCCTACACGACGCTCTTCCGATCT
>> GEXD_AD11
> GATCGTCGGACTGTAGAACTCTGAAC
>> GEXD_AD12
> ACAGGTTCAGAGTTCTACAGTCCGAC
>> GEXD_AD21
> ...
> 
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list