[BioC] a problem with trimLRPatterns

Harris A. Jaffee hj at jhu.edu
Fri Mar 9 21:09:30 CET 2012


The help page could probably use some annotation to guide the reader,
but the mismatch arguments are taken to fall into one of 3 cases:

	Either an integer vector of length 'nLp =
          nchar(Lpattern)' representing an absolute number of
          mismatches (or edit distance if 'with.Lindels' is 'TRUE')
	...
	or a single numeric value in the interval '[0, 1)'
	...
          Otherwise, 'max.Lmismatch' is treated as an integer vector
          where negative numbers are used to prevent trimming at the
          'i'-th location. When an input integer vector is shorter than
          'nLp', it is augmented with enough '-1's at the beginning to
          bring its length up to 'nLp'. Elements of 'max.Lmismatch'
          beyond the first 'nLp' are ignored.

You are using cases 2 and 3 in what you are trying here.

A single numeric value (e.g. 0.1) gets expanded to

		as.integer(mismatch * 1:nchar(pattern))

An integer (1, 2, ... 9) [or a vector of length < nchar(pattern)] is
augmented to a vector of length nchar(pattern) by filling at the bottom
with -1, thus preventing matches and trimming at all of those stages.
Therefore, you cannot get any trimming by setting the mismatch value
to an integer, say M, unless the whole pattern lies (at whichever end)
within an edit distance of M from the subject.  No partial patterns
are even tested.  Instead of a single integer M, you might try

	rep(M, nchar(pattern))

Say, start with M=9, which will give some trimming, and lower M until
you get no trimming.

I think it's better to use a rate, than an integer vector, unless you
want to refine what a rate expands to (above).

On Mar 9, 2012, at 2:15 PM, wang peter wrote:

> i donot know why 0.1 mismatch can work to trim the correct adapter, but
> if i set the max.Rmismatch from 1 to 9, it cannot work
> thanks
> 
>> subject = "GGGAGTAAGAAAGGCACTGAAGGCACTATCAATAGATCGGAAGAGCGGTT"
>> Rpattern = "CTGTAGGCACCATCAATAGATCGGAAGAGCGGTTCAGAAGGAATGCCGAG"
> 
>> trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=1,with.Lindels=TRUE)
> [1] "GGGAGTAAGAAAGGCACTGAAGGCACTATCAATAGATCGGAAGAGCGGTT"
>> trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=0.1,with.Lindels=TRUE)
> [1] "GGGAGTAAGAAAGGCA"
>> trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=2,with.Lindels=TRUE)
> [1] "GGGAGTAAGAAAGGCACTGAAGGCACTATCAATAGATCGGAAGAGCGGTT"
>> trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=3,with.Lindels=TRUE)
> [1] "GGGAGTAAGAAAGGCACTGAAGGCACTATCAATAGATCGGAAGAGCGGTT"
> 
> -- 
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list