[BioC] a question about trimLRPatterns?
Harris A. Jaffee
hj at jhu.edu
Mon Nov 5 21:58:27 CET 2012
Just getting to my mail after the power outage in New Jersey.
I'm laying claim to the sapply quoted here, verbatim as far as
I can tell, sent off-list (my bad) about a year ago in order to
offer an exploratory approach to the setting of max.Rmismatch.
The conclusion would be, for this subject sequence and for the
first Rpattern here, that 0 is a good value, and in the second
case, as Hervé has said, that 2 is good when 1 was not enough.
trimLRPatterns does not actually use any nedit function nor an
sapply, although it does "stop" (at the C level) at the first
position satisfying max.Rmismatch, if any, which of course can
vary over the subject space.
On Oct 30, 2012, at 12:58 PM, wang peter wrote:
> i want to know how this function works?
>
> for example:
> trimLRPatterns(Rpattern = Rpattern, subject = subject,
> max.Rmismatch=1,with.Lindels=TRUE)
>
>
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA"
>
> the function will try to calculate the distance by such coding:
>
> sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) {
> s = substr(subject, j, nchar(subject))
> p = substr(Rpattern, 1, nchar(subject)-j+1)
> neditEndingAt(ending.at=nchar(s), pattern = p, subject = s,
> with.indels=TRUE)
> })
> [1] 0 2 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5
> 6 6 5 4 4 4 3 2 1 0
> [33] 1 1
> when the function find the value which is first satisfy the
> max.Rmismatch value, it will stop
> in this case,they function will stop at the first position.
>
> IF
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGTT"
> The results
> [1] 2 3 4 6 8 10 12 14 15 14 13 12 11 10 9 9 8 7 8 7 6 5
> 6 6 5 4 4 4 3 2 1 0
> [33] 1 1
> it will stop
> in this case,they function will stop at
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern =
> "GAATAGTACTGTAGGCACCATCAATAGATCGGTT"
>
>
> so the shortcoming is the trimLRPatterns cannot find the shared
> sequence between subject and Rpattern
> "GAATAGTACTGTAGGCACCATCAATAGATCGG"
>
> --
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list