[BioC] A problem about trimLRPatterns
wang peter
wng.peter at gmail.com
Fri Mar 2 15:48:28 CET 2012
dear Harris
thank you for your perfect example: but i still have 3 small questions:
1. when j=15, s="GAATAGTACTGTAGGCACCATCAATAGATCGGAA"
and p = "CTGTAGGCACCATCAATAGATCGGAAGAGCGGTT"
and the edit distance between s and p is 16, not 8
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> pattern = "CTGTAGGCACCATCAATAGATCGGAAGAGCGGTTCAGAAGGAATGCCGAG"
>
> sapply(15:nchar(subject), function(j) {
> s = substr(subject, j, nchar(subject))
> p = substr(pattern, 1, nchar(subject)-j+1)
> neditEndingAt(ending.at=nchar(s), pattern = p, subject = s, with.indels=TRUE)
> })
>
> [1] 8 7 6 5 4 3 2 1 0 2 4 6 8 10 11 11 10 9 8 8 9 8 7 7 6
> [26] 5 5 4 3 2 4 3 2 1
2. if the trimLRPatterns try to trim the longest substring in the
scope of mismatch number,
it will remove some bp which are not noise? right?
3. so the trimLRPatterns algorithm is based on the edit distance, right? i think
it uses dynamic programming to calculate the Levenshtein distance.
but it seems much faster than my program which also uses dynamic programming
thank you very much
shan
More information about the Bioconductor
mailing list