[BioC] still the trimLRPatterns problem

Harris A. Jaffee hj at jhu.edu
Tue Oct 11 20:38:35 CEST 2011


On Oct 11, 2011, at 10:59 AM, wang peter wrote:
> i want to remove the  PCR2rc from the subject, but it is can not  
> recognized
> if i set the mismatch =0.2
> how can i sent parameter to let trimLRPatterns works

>    GATCGGAAGAGCACACGTCTGAACTCCA
> TCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACGACACAAGCCC
> AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG

It would really help if you could ask something specific about these  
3 strings.

> subject<- DNAString("
> GATCGGAAGAGCACACGTCTGAACTCCATCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAA 
> AAAAAAACGACACAAGCCC")
> PCR2rc <-
> DNAString 
> ("AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG")

Ok, so these are your major players.  See below.

> max.mismatchs <- 0.25*1:nchar(PCR2rc)

I'm ignoring this since it's irrelevant.

Now, to get everyone on the same page, let me state some pertinent  
facts:

 > PCR2rc.2 <- substr(PCR2rc, 2, nchar(PCR2rc))
 > subject
   89-letter "DNAString" instance
seq:  
GATCGGAAGAGCACACGTCTGAACTCCATCACATCA...TTCTGCTTGAAAAAAAAAAAAAAACGACACAAG 
CCC
 > PCR2rc.2
   63-letter "DNAString" instance
seq: GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
 > neditAt(PCR2rc.2, subject)
[1] 32
 > neditAt(PCR2rc.2, subject, with.indels=TRUE)
[1] 2

So your question is basically how to get a large prefix of the  
subject trimmed,
when it is more like my somewhat artificial 'PCR2rc.2' than your real  
'PCR2rc'.

> trimLRPatterns( Lpattern = PCR2rc, subject = subject,
> max.Lmismatch=0.2,with.Rindels=T)
> 88-letter "DNAString" instance
> seq:
> ATCGGAAGAGCACACGTCTGAACTCCATCACATCACGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAA 
> AAAAAACGACACAAGCCC
>
> trimLRPatterns( Lpattern = PCR2rc, subject = subject,
> max.Lmismatch=0.5,with.Rindels=T)
>   27-letter "DNAString" instance
> seq: AAAAAAAAAAAAAAACGACACAAGCCC

These calls do not make complete sense.  You want  
'with.*L*indels=TRUE'.  More about that
later.  But doing so, you don't need an absurd max.Lmismatch setting;  
0.2 is quite enough:

 > trimLRPatterns(Lpattern = PCR2rc, subject = subject,  
max.Lmismatch=0.2, with.Lindels=TRUE)
   25-letter "DNAString" instance
seq: AAAAAAAAAAAAACGACACAAGCCC

> countPattern(PCR2rc, subject, max.mismatch= 0.2, min.mismatch=0,
> with.indels=TRUE)
> [1] 0

As I've said before, the matchPattern/countPattern family is  
insensitive to non-integral
mismatch values.  They are silently truncated, via as.integer().  In  
this case, your 0.2
becomes 0.  Again, the pertinent facts are:

 > neditAt(PCR2rc, subject, with.indels=TRUE)
[1] 3
 > countPattern(PCR2rc, subject, max.mismatch=3, with.indels=TRUE)
[1] 1

Ok, so now we can consider these somewhat philosophical questions:

	1) Should trimLRPatterns save you from setting irrelevant parameters  
(with.Rindels,
	when you're "trimming on the left")?

	2) Should matchPattern/countPattern save you from a funny mismatch  
setting?

I don't know about 1) per se, but in view of "trimLRPatterns 2.0",  
the indels parameters
will be on/TRUE by default (and possibly not exist at all).

For 2), I think you should get a warning, at least, if not a hard error.

> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list