[BioC] a question about trimLRPatterns?
Harris A. Jaffee
hj at jhu.edu
Thu Jan 19 22:45:36 CET 2012
On Jan 19, 2012, at 4:20 PM, Harris A. Jaffee wrote:
> To quote from ?trimLRPatterns, for Lpattern here,
>
> Once the integer vector is constructed using the rules given
> above, when 'with.Lindels' is 'FALSE', 'max.Lmismatch[i]' is
> the number of acceptable mismatches (errors) between the
> suffix 'substring(Lpattern, nLp - i + 1, nLp)' of 'Lpattern'
> and the first 'i' letters of 'subject'. When 'with.Lindels'
> is 'TRUE', 'max.Lmismatch[i]' represents the allowed "edit
> distance" between that suffix of 'Lpattern' and 'subject',
> starting at position '1' of 'subject' (as in 'matchPattern'
> and 'isMatchingStartingAt').
>
> For a given element 's' of the 'subject', the initial segment
> (prefix) 'substring(s, 1, j)' of 's' is trimmed if 'j' is the
> largest 'i' for which there is an acceptable match, if any.
>
> If you are asking about implementation, the sub-patterns, i.e suffixes of
> Lpattern or prefixes of Rpattern, are tested "longest first" using the
> the relevant max.mismatch vector "from the top, down". (Intuitively, you
> should think of your max.mismatch vectors as being monotone increasing,
> perhaps not strictly.) The testing process at the relevant side of the
> subject stops if/when an acceptable match is seen. The See Also refers to
> ?`lowlevel-matching`, where you will find which.isMatchingStartingAt() and
> which.isMatchingEndingAt(). These functions are called with
> auto.reduce.pattern=TRUE, which allows a single "pattern" and single "at"
> value to be passed in the context of a *vector* "max.mismatch" value, the
> actual pattern being tested getting iteratively shorter by 1 character as
> necessary, for each element of the subject, automatically.
To clarify, in the C code, there are two loops. There is an outside loop
over the subject, and then for each subject element, the specified single
pattern is iteratively "auto-reduced" as necessary.
> Let me know if I didn't get at your question.
>
> On Jan 19, 2012, at 3:15 PM, wang peter wrote:
>
>> hello all:
>>
>> i want to know how this function process data?
>>
>> for left match
>> it is taken as a "rate" and is converted to
>> max.Lmismatch=as.integer(1:nLp *rate )
>> then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp)
>> of Lpattern and the first i letters of subject.
>> dees i start from 1 or nLp? and the corresponding allowed mismatch is
>> max.Lmismatch[i]?
>>
>> for the right match
>> it is taken as a "rate" and is converted to
>> max.Rmismatch=as.integer(1:nRp * rate)
>> then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp)
>> of subject and the first i letters of Rpattern.
>> dees i start from 1 or nRp? and the corresponding allowed mismatch is
>> max.Rmismatch[i]?
>>
>> --
>> shan gao
>> Room 231(Dr.Fei lab)
>> Boyce Thompson Institute
>> Cornell University
>> Tower Road, Ithaca, NY 14853-1801
>> Office phone: 1-607-254-1267(day)
>> Official email:sg839 at cornell.edu
>> Facebook:http://www.facebook.com/profile.php?id=100001986532253
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list