[BioC] fast iterator over DNAString's?
pshannon at systemsbiology.org
Thu Mar 11 01:30:40 CET 2010
I wish to trim a variable length sequence from the end of many thousands of DNAStrings in a DNAStringSet.
The sequence to be trimmed is any recognizable chunk of a solexa short read adapter, which ends up on the end of, for example, 22nt miRNAs. The adapter chunk might be found in the middle of a 35 base read, or it might be closer to the end. In every case, I want to delete every base from the start of the adapter chunk to the end of the read.
I imagine there might be a BString operation equivalent to sed. See could be used ike this:
echo 'CGAAGCGGGATGATCTATCTCGTATGCCGTCTTCT' | sed s/TCGTATGCCGTC.*$// --> GAAGCGGGATGATCTATC
(where TCGTATGCCGTC is only part of the 21-base adapter, but is probably a long enough portion to be representative)
Any way to do this with BStrings and friends?
More information about the Bioconductor