[Bioc-sig-seq] a question about trimLRPatterns

Harris A. Jaffee hj at jhu.edu
Wed Aug 31 00:46:41 CEST 2011


On Aug 30, 2011, at 4:42 PM, wang peter wrote:
> hi every one:
>     i am still confused by parameters in function trimLRPatterns
> if i set
> Lpattern <- "AAAAAAAATTCTGCT"
> Rpattern <- "GATCGGATTTTTTTT"
> subject <- DNAString("TTCTGCTTGACGTGATCGGA")
> trimLRPatterns(Lpattern = Lpattern, subject = subject)#
> the results are
>
> 13-letter "DNAString" instance
> seq: TGACGTGATCGGA
>
> trimLRPatterns(Rpattern = Rpattern, subject = subject)#
>
> the results are
> 13-letter "DNAString" instance
> seq: TTCTGCTTGACGT
>
> i think with.Rindels = F with.Lindels = F
>
> AAAAAAAA and TTTTTTTT are insertions

To reduce any possible confusion, let's just take this case:

	Lpattern <- "AAAAAAAATTCTGCT"
	subject <-          "TTCTGCT"

 > trimLRPatterns(Lpattern = Lpattern, subject = subject)
[1] ""

# and just to be clear about the defaults
 > trimLRPatterns(Lpattern = Lpattern, subject = subject,
	max.Lmismatch=0, with.Lindels=FALSE)
[1] ""

Since there are no A's in subject, and max.Lmismatch=0, I think you
are saying that the substring "AAAAAAAA" of Lpattern appears to match
freely, as if it was being treated as an in/del, without any penalty.
That is not what is happening.

The function takes max.Lmismatch=0 as

	max.Lmismatch = rep(0, nchar(Lpattern))

So, *all* suffixes of the Lpattern are candidates for trimming at
the beginning of the subject, so long as they exact-match, and the
longest wins.  By the suffixes of the Lpattern I mean, in order,

	substr(Lpattern, i, nchar(Lpattern)), i = 1:nchar(Lpattern)

The first one to match is "TTCTGCT" (i = 9), which actually equals
the subject.  This is why the function returns "".  It has nothing
to do with indels.

Maybe a better example for indels is:

 >  subject = "TTTACGT"
 > Lpattern = "TTTAACGT"		# pattern has an extra 'A'

 > trimLRPatterns(Lpattern = Lpattern, subject = subject,  
max.Lmismatch=3)
[1] "TTTACGT"

# need to allow for 4 errors because of the extra A
 > trimLRPatterns(Lpattern = Lpattern, subject = subject,  
max.Lmismatch=4)
[1] ""

 > trimLRPatterns(Lpattern = Lpattern, subject = subject,  
max.Lmismatch=0,
	with.Lindels=TRUE)
[1] "TTACGT"

# need to allow for 1 "edit", to remove the extra A
 > trimLRPatterns(Lpattern = Lpattern, subject = subject,  
max.Lmismatch=1,
	with.Lindels=TRUE)
[1] ""


Let me know if I didn't get your point.

> thank you
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list