[BioC] A problem about trimLRPatterns

wang peter wng.peter at gmail.com
Fri Mar 2 15:48:28 CET 2012


dear Harris
thank you for your perfect example: but i still have 3 small questions:

1. when j=15, s="GAATAGTACTGTAGGCACCATCAATAGATCGGAA"
and p = "CTGTAGGCACCATCAATAGATCGGAAGAGCGGTT"
and the edit distance between s and p is 16, not 8

> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> pattern = "CTGTAGGCACCATCAATAGATCGGAAGAGCGGTTCAGAAGGAATGCCGAG"
>
> sapply(15:nchar(subject), function(j) {
>        s = substr(subject, j, nchar(subject))
>        p = substr(pattern, 1, nchar(subject)-j+1)
>        neditEndingAt(ending.at=nchar(s), pattern = p, subject = s, with.indels=TRUE)
> })
>
>  [1]  8  7  6  5  4  3  2  1  0  2  4  6  8 10 11 11 10  9  8  8  9  8  7  7  6
> [26]  5  5  4  3  2  4  3  2  1

2. if the trimLRPatterns try to trim the longest substring in the
scope of mismatch number,
it will remove some bp which are not noise? right?

3. so the trimLRPatterns algorithm is based on the edit distance, right? i think
it uses dynamic programming to calculate the Levenshtein distance.
but it seems much faster than my program which also uses dynamic programming

thank you very much
shan



More information about the Bioconductor mailing list