[BioC] a question about the low level match function
Harris A. Jaffee
hj at jhu.edu
Tue Nov 6 22:02:06 CET 2012
On Nov 6, 2012, at 3:02 PM, wang peter wrote:
> dear ALL, harry and steve:
> i am so sorry to disturb you again.but this time,i read the mannu
> and some source coding carefully. but still confused with the process
> how trimLRPatterns works?
> i trace back to the function
>
> Biostrings:::.computeTrimEnd
The relevant statement is
ii <- which.isMatchingEndingAt(pattern = Rpattern, subject = subject,
ending.at = subject_width, max.mismatch = max.Rmismatch,
with.indels = with.Rindels, fixed = Rfixed, auto.reduce.pattern = TRUE)
'subject_width' is constant at this time, because of this earlier test:
if (!isConstant(width(subject))) {
tmp <- .computeTrimStart(reverse(Rpattern), reverse(subject),
max.Rmismatch, with.Rindels, Rfixed)
return(width(subject) - tmp + 1L)
}
auto.reduce.pattern=TRUE tells the *EndingAt function to test a vector of
patterns against each subject element subject to the 'max.mismatch' vector
of edit distance limits. These patterns are constructed behind the scenes
(in C) from your single 'pattern=Rpattern'. For example, if your Rpattern
was "TCGGAA", the test patterns would be, in order,
"TCGGAA"
"TCGGA"
"TCGG"
"TCG"
"TC"
"T"
They are tested using 'ending.at=subject_width', as I've hinted by the way
I've written them. The "which" in the function name is associated with its
underlying code (in this case, C code) stopping at the first hit, subject to
your edit limits. For example, if a subject element happens to end with
"TCGGA" within your limits, the test loop for that subject element stops.
> showMethods(which.isMatchingEndingAt, includeDefs=TRUE)
> Biostrings:::.matchPatternAt
>
> if (is(subject, "XString"))
> .Call2("XString_match_pattern_at", pattern, subject,
> at, at.type, max.mismatch, min.mismatch, with.indels,
> fixed, ans.type, auto.reduce.pattern, PACKAGE = "Biostrings")
> else .Call2("XStringSet_vmatch_pattern_at", pattern, subject,
> at, at.type, max.mismatch, min.mismatch, with.indels,
> fixed, ans.type, auto.reduce.pattern, PACKAGE = "Biostrings")
>
> i think it will call the low level coding.
Yes, these are calls to C. 'at.type' is set to 1L by all the *EndingAt
functions (and to 0L by all the *StartingAt functions). The statement
above in .computeTrimEnd supplies 'ending.at', namely the subject width,
which is sent as the 'at' argument of .matchPatternAt and forwarded to C.
> for example:
> trimLRPatterns(Rpattern = Rpattern, subject = subject,
> max.Rmismatch=0.1, with.Lindels=TRUE)
>
> subject = "TATAGTAGATATTGGAATAGTACTGTAGGCACCATCAATAGATCGGAA"
> Rpattern = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA"
>
> then the function will change max.Rmismatch to
> max.Rmismatch= as.integer(max.Rmismatch*1:nchar(Rpattern))
> [1] 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3
>
> as i know the process is,it try to get the distance between p and s
>
> p = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA" allowing 3 mismatch
> s = "GAATAGTACTGTAGGCACCATCAATAGATCGGAA"
>
> p = "AATAGTACTGTAGGCACCATCAATAGATCGGAA" allowing 3 mismatch
> s = "GAATAGTACTGTAGGCACCATCAATAGATCGGA"
> ...
> p = "A" allowing 0 mismatch
> s = "G"
>
> but what does the parameter at mean?
See 'at' and 'ending.at' above. Does this help?
> --
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list