[BioC] a problem of trimLRPatterns still confused me
Wang Peter
wng.peter at gmail.com
Fri Nov 30 21:36:27 CET 2012
thank you very much, Harris,you helped me again
now i understand, see the below
max.mismatchs <- 0.2*1:nchar(Rpattern)
subject = "GGTAACTTTTCTGACACCTCCTGCTTAAAACCCCAAAGGTCAGAAGGATCGTGAGGCCCCGCTTTCACGGTCTGTATTCGTACTGAAAATCAAGATCAAG"
Rpattern = "AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGCCGTCTTCTGCTTG"
sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) {
s = substr(subject, j, nchar(subject))
p = substr(Rpattern, 1, nchar(subject)-j+1)
neditEndingAt(ending.at=nchar(s), pattern = p, subject = s,
with.indels=TRUE)
})
all distance
[1] 32 33 33 32 31 32 31 30 29 28 27 26 27 26 25 25 24 23 22 22 21 20 20 20
[25] 20 19 18 17 18 17 17 18 17 16 15 16 15 14 13 12 12 11 10 9 8 7 6 6
[49] 6 6 6 5 4 3 (2) 3 3 3 3 3 2 1 0 1
max.mismatchs
[1] 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 (2.0) 2.2 2.4 2.6
2.8 3.0 3.2 3.4 3.6 3.8
[20] 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4
6.6 6.8 7.0 7.2 7.4 7.6
[39] 7.8 8.0 8.2 8.4 8.6 8.8 9.0 9.2 9.4 9.6 9.8 10.0 10.2
10.4 10.6 10.8 11.0 11.2 11.4
[58] 11.6 11.8 12.0 12.2 12.4 12.6 12.8
when the function find a distance < = the corresponding mismatch. see
(2) and (2.0), the function stops.
but i think the distance between those 10bp kmer should be 4, not 2
CAAGATC AAG
AGATCGGAAG
More information about the Bioconductor
mailing list