[Bioc-sig-seq] a wired problem

Harris A. Jaffee hj at jhu.edu
Wed Sep 28 23:45:19 CEST 2011


I forgot to say that in this code,

> result <- vcountPattern(PCR2rc, seqs, max.mismatch=1, min.mismatch=0,
> with.indels=TRUE, algorithm="indels")
> reads <- reads[result]


you do NOT want to select from 'reads' using 'result' as is, since
its values are all 0 or 1:

 > str(result)
  int [1:25000] 0 0 0 0 0 0 0 0 0 0 ...

 > table(result)
result
     0     1
24959    41

I suppose you are getting 41 instances of your first read, namely

   40-letter "DNAString" instance
seq: GTTTTCTCATTNGAAATTTTGTTACCGCAAAANNNCCACC

when instead, you want something like

	reads[result == 1]

On Sep 28, 2011, at 5:27 PM, Harris A. Jaffee wrote:

> I don't understand where your 'subject1' sequence:
>
>> subject1 = "GTTGGTGCAAACATTAGTTCTTCTGTTGGTGCAACCTTTG"
>
> came from.
>
> Modulo that, everything looks fine.
>
> In my hands, your vcountPattern call finds 41 40-letter elements
> of 'seqs', all equal to GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA,
> that match your 41-letter pattern, subject to your
>
> 	max.mismatch=1, with.indels=TRUE'
>
> They are all equal to your pattern with its first letter deleted.
>
> I haven't looked closely, but I don't see a reason to doubt this
> result.
>
> The following calls would seem to be the sanity checks you want.
> They at least confirm the 41 fuzzy matches.
>
> > countPattern(PCR2rc, "GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA",
> 	max.mismatch=1, with.indels=TRUE)
> [1] 1
>
> > matchPattern(PCR2rc, "GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA",
> 	max.mismatch=1, with.indels=TRUE)
>
> Views on a 40-letter BString subject
> subject: GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA
> views:
>     start end width
> [1]     1  40    40 [GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA]
>
> On Sep 28, 2011, at 4:14 PM, wang peter wrote:
>
>> dear all:
>>      Using vcountPattern, i found some matched sequences.
>> but those are not similar to the pattern.
>>      see such coding
>>
>> rm(list=ls())
>> reads <- readFastq(fastqfile);#downloaded from
>> http://biocluster.ucr.edu/~tbackman/query.fastq
>> seqs <- sread(reads);
>> PCR2rc<-DNAString("AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA")
>> result <- vcountPattern(PCR2rc, seqs, max.mismatch=1, min.mismatch=0,
>> with.indels=TRUE, algorithm="indels")
>> reads <- reads[result]
>> seqs <- sread(reads)
>> sum(result)
>>      then using countPattern, i found they are really not match
>>
>> subject1 = "GTTGGTGCAAACATTAGTTCTTCTGTTGGTGCAACCTTTG"
>> result <- countPattern(PCR2rc, subject1, max.mismatch=1,  
>> min.mismatch=0,
>> with.indels=TRUE)
>> [1] 0
>>
>> shan gao
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



More information about the Bioc-sig-sequencing mailing list