[BioC] a question about trimLRPatterns?

Harris A. Jaffee hj at jhu.edu
Mon Nov 5 21:58:27 CET 2012


Just getting to my mail after the power outage in New Jersey.

I'm laying claim to the sapply quoted here, verbatim as far as
I can tell, sent off-list (my bad) about a year ago in order to
offer an exploratory approach to the setting of max.Rmismatch.

The conclusion would be, for this subject sequence and for the
first Rpattern here, that 0 is a good value, and in the second
case, as Hervé has said, that 2 is good when 1 was not enough.

trimLRPatterns does not actually use any nedit function nor an
sapply, although it does "stop" (at the C level) at the first
position satisfying max.Rmismatch, if any, which of course can
vary over the subject space.

On Oct 30, 2012, at 12:58 PM, wang peter wrote:

> i want to know how this function works?
> 
> for example:
> trimLRPatterns(Rpattern = Rpattern, subject = subject,
> max.Rmismatch=1,with.Lindels=TRUE)
> 
> 
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern =              "GAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> 
> the function will try to calculate the distance by such coding:
> 
> sapply((nchar(subject)-nchar(Rpattern)+1):nchar(subject), function(j) {
>        s = substr(subject, j, nchar(subject))
>        p = substr(Rpattern, 1, nchar(subject)-j+1)
>        neditEndingAt(ending.at=nchar(s), pattern = p, subject = s,
> with.indels=TRUE)
> })
> [1]  0  2  4  6  8 10 12 14 15 14 13 12 11 10  9  9  8  7  8  7  6  5
> 6  6  5  4  4  4  3  2  1  0
> [33]  1  1
> when the function find the value which is first satisfy the
> max.Rmismatch value, it will stop
> in this case,they function will stop at the first position.
> 
> IF
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern =              "GAATAGTACTGTAGGCACCATCAATAGATCGGTT"
> The results
> [1]  2  3  4  6  8 10 12 14 15 14 13 12 11 10  9  9  8  7  8  7  6  5
> 6  6  5  4  4  4  3  2  1  0
> [33]  1  1
> it will stop
> in this case,they function will stop at
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern =
> "GAATAGTACTGTAGGCACCATCAATAGATCGGTT"
> 
> 
> so the shortcoming is the trimLRPatterns cannot find the shared
> sequence between subject and Rpattern
> "GAATAGTACTGTAGGCACCATCAATAGATCGG"
> 
> -- 
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list