[Bioc-sig-seq] trimLRPatterns question

Kunbin Qu KQu at genomichealth.com
Sat Oct 30 06:46:47 CEST 2010


I tried that, it did not work (and I do not quite understand why you suggest Rpattern="TGGT" or "TGAC"):

> t1<-DNAString("AAAATGGTAC")
> t2<-DNAString("TGGT")
> t3<-trimLRPatterns(Rpattern=t2,subject=t1)
> t3
  10-letter "DNAString" instance
seq: AAAATGGTAC
> t3<-trimLRPatterns(Rpattern=t2,subject=t1, max.Rmismatch=rep(0,4))
> t3
  10-letter "DNAString" instance
seq: AAAATGGTAC
> t2<-DNAString("TGAC")
> trimLRPatterns(Rpattern=t2,subject=t1, max.Rmismatch=rep(0,4))
  10-letter "DNAString" instance
seq: AAAATGGTAC



If you mean to allow trimming a pattern like "TG.." in letters 7-10  
or less of your subject, even
if TG occurs in letters 5-6, as in your t1, then you might want  
Rpattern="TGGT", and max.Rmismatch
should have length 4 (or less, if you want to prevent trimming less  
than 4 letters).

If I didn't get the point, a longer answer:

Usually, I over-trimmed when I set the mismatch limits too high  
(especially with indels allowed),
probably when I used the "rate" feature for the mismatch parameter,  
and that resulted in certain
mismatch elements being larger than I wanted.  But you have zero  
mismatches allowed, and your whole
Rpattern occurs exactly, as the final 6 letters of your subject.  So  
your example is a good example
of what the function does (except that you don't need to allow  
indels, and you could get your t4 as
the value of trimLRPatterns by using ranges=F, the default).  You can  
put -1 in the mismatch vector,
for example

	M = some integer
	trimLRPatterns(Rpattern=t2, subject=t1, max.Rmismatch = c(rep(-1,5),  
M))

This allows the 6-letter suffix of your subject to be trimmed if it  
matches your Rpattern with M
or less errors, but it prevents trimming by anything shorter.   
Actually, your rep(0,4) is the same
as c(rep(-1,2), rep(0,4)).  That is, trimming by 6 or 5 letters is  
allowed, but not by less.  Thus,

 > trimLRPatterns(Rpattern="TGGTAC", subject="AAAATGGTAC",  
max.Rmismatch=0)
[1] "AAAA"

 > trimLRPatterns(Rpattern="TGGTAC", subject="AAAATGGTAC",  
max.Rmismatch=c(rep(-1,5), 0))
[1] "AAAA"

You can even put -1 as the 6th element of max.Rmismatch, but then you  
might as well start with
Rpattern="TGGTA"; or -1 in positions 5 and 6, so, Rpattern="TGGT".





______________________________________________________________________
The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmaster at genomichealth.com and delete this message, along with any attachments, from your computer.



More information about the Bioc-sig-sequencing mailing list