[BioC] a problem with trimLRPatterns
Harris A. Jaffee
hj at jhu.edu
Fri Mar 9 21:09:30 CET 2012
The help page could probably use some annotation to guide the reader,
but the mismatch arguments are taken to fall into one of 3 cases:
Either an integer vector of length 'nLp =
nchar(Lpattern)' representing an absolute number of
mismatches (or edit distance if 'with.Lindels' is 'TRUE')
...
or a single numeric value in the interval '[0, 1)'
...
Otherwise, 'max.Lmismatch' is treated as an integer vector
where negative numbers are used to prevent trimming at the
'i'-th location. When an input integer vector is shorter than
'nLp', it is augmented with enough '-1's at the beginning to
bring its length up to 'nLp'. Elements of 'max.Lmismatch'
beyond the first 'nLp' are ignored.
You are using cases 2 and 3 in what you are trying here.
A single numeric value (e.g. 0.1) gets expanded to
as.integer(mismatch * 1:nchar(pattern))
An integer (1, 2, ... 9) [or a vector of length < nchar(pattern)] is
augmented to a vector of length nchar(pattern) by filling at the bottom
with -1, thus preventing matches and trimming at all of those stages.
Therefore, you cannot get any trimming by setting the mismatch value
to an integer, say M, unless the whole pattern lies (at whichever end)
within an edit distance of M from the subject. No partial patterns
are even tested. Instead of a single integer M, you might try
rep(M, nchar(pattern))
Say, start with M=9, which will give some trimming, and lower M until
you get no trimming.
I think it's better to use a rate, than an integer vector, unless you
want to refine what a rate expands to (above).
On Mar 9, 2012, at 2:15 PM, wang peter wrote:
> i donot know why 0.1 mismatch can work to trim the correct adapter, but
> if i set the max.Rmismatch from 1 to 9, it cannot work
> thanks
>
>> subject = "GGGAGTAAGAAAGGCACTGAAGGCACTATCAATAGATCGGAAGAGCGGTT"
>> Rpattern = "CTGTAGGCACCATCAATAGATCGGAAGAGCGGTTCAGAAGGAATGCCGAG"
>
>> trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=1,with.Lindels=TRUE)
> [1] "GGGAGTAAGAAAGGCACTGAAGGCACTATCAATAGATCGGAAGAGCGGTT"
>> trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=0.1,with.Lindels=TRUE)
> [1] "GGGAGTAAGAAAGGCA"
>> trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=2,with.Lindels=TRUE)
> [1] "GGGAGTAAGAAAGGCACTGAAGGCACTATCAATAGATCGGAAGAGCGGTT"
>> trimLRPatterns(Rpattern = Rpattern, subject = subject, max.Rmismatch=3,with.Lindels=TRUE)
> [1] "GGGAGTAAGAAAGGCACTGAAGGCACTATCAATAGATCGGAAGAGCGGTT"
>
> --
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list