[BioC] a problem of trimLRPatterns still confused me
Harris A. Jaffee
hj at jhu.edu
Sat Dec 1 18:44:16 CET 2012
On Dec 1, 2012, at 12:00 PM, Wang Peter wrote:
> dear Harris
> thank you so much for your kindly explanation
> i am so ashamed to disturb u again.
>
> my understanding is
>
> when they low-level function to caculate the distance between S and P
In this situation, the C function _nedit_for_Proffset() is called, but
the purpose is much more than to calculate the edit distance between S
and P. As I quoted before from ?`lowlevel-matching`, it is to determine
the minimum edit distance between P and all the suffixes S' of S.
> S= CAAGATC AAG
> P= AGATCGGAAG
>
>
> it will try
> CAAGATCAAG
> AAGATCAAG
> AGATCAAG
> GATCAAG
> ...
> G
>
> and get all the edit distance
> but 2 is the smallest one
Yes, 2 is the minimum described above, occurring for the 8-letter suffix
S' = AAGATCAAG of S = CAAAGATCAAG.
> so it will take 2 as the distance between S and P
Not the distance between S and P, which you correctly observed in a previous
post was 4, but the distance between the entire pattern P and some suffix S'
of S, unknown to trimLRPatterns.
> S'= AGATC AAG
> P= AGATCGGAAG
>
> and then trim the whole S, rather than S'
The whole S is taken by trimLRPatterns as its best guess at S'. In this
case, a little more than necessary is trimmed, perhaps in other cases, a
little less than necessary.
> --
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253
More information about the Bioconductor
mailing list