[BioC] a question about trimLRPatterns?

Thu Jan 19 22:45:36 CET 2012

On Jan 19, 2012, at 4:20 PM, Harris A. Jaffee wrote:
> To quote from ?trimLRPatterns, for Lpattern here,
> 
>          Once the integer vector is constructed using the rules given
>          above, when 'with.Lindels' is 'FALSE', 'max.Lmismatch[i]' is
>          the number of acceptable mismatches (errors) between the
>          suffix 'substring(Lpattern, nLp - i + 1, nLp)' of 'Lpattern'
>          and the first 'i' letters of 'subject'.  When 'with.Lindels'
>          is 'TRUE', 'max.Lmismatch[i]' represents the allowed "edit
>          distance" between that suffix of 'Lpattern' and 'subject',
>          starting at position '1' of 'subject' (as in 'matchPattern'
>          and 'isMatchingStartingAt').
> 
>          For a given element 's' of the 'subject', the initial segment
>          (prefix) 'substring(s, 1, j)' of 's' is trimmed if 'j' is the
>          largest 'i' for which there is an acceptable match, if any.
> 
> If you are asking about implementation, the sub-patterns, i.e suffixes of
> Lpattern or prefixes of Rpattern, are tested "longest first" using the
> the relevant max.mismatch vector "from the top, down". (Intuitively, you
> should think of your max.mismatch vectors as being monotone increasing,
> perhaps not strictly.)  The testing process at the relevant side of the
> subject stops if/when an acceptable match is seen.  The See Also refers to
> ?`lowlevel-matching`, where you will find which.isMatchingStartingAt() and
> which.isMatchingEndingAt().  These functions are called with
> auto.reduce.pattern=TRUE, which allows a single "pattern" and single "at"
> value to be passed in the context of a *vector* "max.mismatch" value, the
> actual pattern being tested getting iteratively shorter by 1 character as
> necessary, for each element of the subject, automatically.

To clarify, in the C code, there are two loops.  There is an outside loop
over the subject, and then for each subject element, the specified single
pattern is iteratively "auto-reduced" as necessary.

> Let me know if I didn't get at your question.
> 
> On Jan 19, 2012, at 3:15 PM, wang peter wrote:
> 
>> hello all:
>> 
>> i want to know how this function process data?
>> 
>> for left match
>> it is taken as a "rate" and is converted to
>> max.Lmismatch=as.integer(1:nLp *rate )
>> then it try to match between the suffix substring(Lpattern, nLp - i + 1, nLp)
>> of Lpattern and the first i letters of subject.
>> dees i start from 1 or nLp? and the corresponding allowed mismatch is
>> max.Lmismatch[i]?
>> 
>> for the right match
>> it is taken as a "rate" and is converted to
>> max.Rmismatch=as.integer(1:nRp * rate)
>> then it try to match between the suffix substring(Rpattern, nRp - i + 1, nRp)
>> of subject and the first i letters of Rpattern.
>> dees i start from 1 or nRp? and the corresponding allowed mismatch is
>> max.Rmismatch[i]?
>> 
>> -- 
>> shan gao
>> Room 231(Dr.Fei lab)
>> Boyce Thompson Institute
>> Cornell University
>> Tower Road, Ithaca, NY 14853-1801
>> Office phone: 1-607-254-1267(day)
>> Official email:sg839 at cornell.edu
>> Facebook:http://www.facebook.com/profile.php?id=100001986532253
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor