[Bioc-sig-seq] adapter removal

Patrick Aboyoun paboyoun at fhcrc.org
Thu Jan 15 01:17:17 CET 2009


I just checked in a trimLRPatterns function to the Bioconductor svn 
repository for BioC 2.4. Its signature is

  trimLRPatterns(Lpattern = NULL, Rpattern = NULL, subject,
                 max.Lmismatch = 0, max.Rmismatch = 0,
                 with.Lindels = FALSE, with.Rindels = FALSE,
                 Lfixed = TRUE, Rfixed = TRUE, ranges = FALSE)

As you can infer from the arguments, this function allows the user to 
set the # of mismatches (if with.*indels = FALSE) / edit distance (if 
with.*indels = TRUE) for the left and right flanking "patterns". It also 
allows for IUPAC ambiguity letters in these flanking regions if *fixed = 
FALSE. When ranges = FALSE, trimLRPatterns returns the trimmed strings. 
When ranges = TRUE, it returns the ranges that you can use to trim the 
strings. Here are some examples:

 >   Lpattern <- "TTCTGCTTG"
 >   Rpattern <- "GATCGGAAG"
 >   subject <- DNAString("TTCTGCTTGACGTGATCGGA")
 >   subjectSet <- DNAStringSet(c("TGCTTGACGGCAGATCGG", 
"TTCTGCTTGGATCGGAAG"))
 >   trimLRPatterns(Lpattern = Lpattern, subject = subject)
  11-letter "DNAString" instance
seq: ACGTGATCGGA
 >   trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = 
subjectSet)
  A DNAStringSet instance of length 2
    width seq
[1]    18 TGCTTGACGGCAGATCGG
[2]     0
 >   trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = 
subjectSet,
+                  ranges = TRUE)
IRanges object:
  start end width
1     1  18    18
2    10   9     0

This functionality will be available on bioconductor.org (and 
downloadable via biocLite) in the next day or so, but you can also grab 
Biostrings from svn directly if you need it sooner. It will also feed 
its way into Biostrings documentation and training material before the 
next release of Bioconductor in May.


Patrick



Patrick Aboyoun wrote:
> David,
> Following up on Martin's comments, I am putting the finishing touches 
> on a function called trimLRPatterns for the Biostrings package. Its 
> purpose is to trim left and/or right flanking patterns from sequences, 
> so it can strip 5' and/or 3' adapters from your reads. The signature 
> for this function is
>
>  trimLRPatterns(Lpattern=NULL, Rpattern=NULL, subject, max.Lnedit=0, 
> max.Rnedit=0,
>                 with.Lindels=FALSE, with.Rindels=FALSE, Lfixed=TRUE, 
> Rfixed=TRUE,
>                 rangesOnly = FALSE)
>
> I will be checking this function into the BioC 2.4 code line, which 
> requires using R-devel, sometime today or tomorrow. I will send out an 
> e-mail to this group when I check it in and show a simple example of 
> its usage. I talked with Martin and he will wrap this functionality in 
> the ShortRead layer so you don't have to leave the ShortRead class 
> system when removing adapters from your reads.
>
>
> Cheers,
> Patrick
>



More information about the Bioc-sig-sequencing mailing list