[BioC] still the trimLRPatterns problem
Harris A. Jaffee
hj at jhu.edu
Tue Oct 11 20:38:35 CEST 2011
On Oct 11, 2011, at 10:59 AM, wang peter wrote:
> i want to remove the PCR2rc from the subject, but it is can not
> recognized
> if i set the mismatch =0.2
> how can i sent parameter to let trimLRPatterns works
> GATCGGAAGAGCACACGTCTGAACTCCA
> TCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACGACACAAGCCC
> AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
It would really help if you could ask something specific about these
3 strings.
> subject<- DNAString("
> GATCGGAAGAGCACACGTCTGAACTCCATCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAA
> AAAAAAACGACACAAGCCC")
> PCR2rc <-
> DNAString
> ("AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG")
Ok, so these are your major players. See below.
> max.mismatchs <- 0.25*1:nchar(PCR2rc)
I'm ignoring this since it's irrelevant.
Now, to get everyone on the same page, let me state some pertinent
facts:
> PCR2rc.2 <- substr(PCR2rc, 2, nchar(PCR2rc))
> subject
89-letter "DNAString" instance
seq:
GATCGGAAGAGCACACGTCTGAACTCCATCACATCA...TTCTGCTTGAAAAAAAAAAAAAAACGACACAAG
CCC
> PCR2rc.2
63-letter "DNAString" instance
seq: GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
> neditAt(PCR2rc.2, subject)
[1] 32
> neditAt(PCR2rc.2, subject, with.indels=TRUE)
[1] 2
So your question is basically how to get a large prefix of the
subject trimmed,
when it is more like my somewhat artificial 'PCR2rc.2' than your real
'PCR2rc'.
> trimLRPatterns( Lpattern = PCR2rc, subject = subject,
> max.Lmismatch=0.2,with.Rindels=T)
> 88-letter "DNAString" instance
> seq:
> ATCGGAAGAGCACACGTCTGAACTCCATCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAA
> AAAAAACGACACAAGCCC
>
> trimLRPatterns( Lpattern = PCR2rc, subject = subject,
> max.Lmismatch=0.5,with.Rindels=T)
> 27-letter "DNAString" instance
> seq: AAAAAAAAAAAAAAACGACACAAGCCC
These calls do not make complete sense. You want
'with.*L*indels=TRUE'. More about that
later. But doing so, you don't need an absurd max.Lmismatch setting;
0.2 is quite enough:
> trimLRPatterns(Lpattern = PCR2rc, subject = subject,
max.Lmismatch=0.2, with.Lindels=TRUE)
25-letter "DNAString" instance
seq: AAAAAAAAAAAAACGACACAAGCCC
> countPattern(PCR2rc, subject, max.mismatch= 0.2, min.mismatch=0,
> with.indels=TRUE)
> [1] 0
As I've said before, the matchPattern/countPattern family is
insensitive to non-integral
mismatch values. They are silently truncated, via as.integer(). In
this case, your 0.2
becomes 0. Again, the pertinent facts are:
> neditAt(PCR2rc, subject, with.indels=TRUE)
[1] 3
> countPattern(PCR2rc, subject, max.mismatch=3, with.indels=TRUE)
[1] 1
Ok, so now we can consider these somewhat philosophical questions:
1) Should trimLRPatterns save you from setting irrelevant parameters
(with.Rindels,
when you're "trimming on the left")?
2) Should matchPattern/countPattern save you from a funny mismatch
setting?
I don't know about 1) per se, but in view of "trimLRPatterns 2.0",
the indels parameters
will be on/TRUE by default (and possibly not exist at all).
For 2), I think you should get a warning, at least, if not a hard error.
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/
> gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list