[BioC] mismatch & replacement

Fri Nov 5 23:19:44 CET 2010

On 11/05/2010 01:54 PM, Daniel.Berner at unibas.ch wrote:
> Hi list
> 1. I have a large fastq file containing solexa reads that start with a
> barcode (identifier to separate individuals). I now want to filter that
> large data set according to the barcodes using ShortRead. I understand
> that this is easily done with grep() when one wants a perfect barcode
> match. However, I want to allow ONE single wrong nucleotide within the
> barcode, at any position. Is there an efficient way to filter by barcode
> while allowing a mismatch?
>
> 2. Is there a way to modify nucleotides in ShortRead objects? E.g., to
> replace a G by an A at position 3 for ALL sequences in the object?

Hi Daniel --

a strategy is to narrow() the reads to the region of the bar code, and
then countPDict(<narrowed seqs>, DNAString(<barcode>), max.mismatch=1L)
!= 0, or vcountPDict.

I think part 2 is along the lines of

  idx = as.character(subseq(dna, 3, 3)) == "G"
  subseq(dna[idx], 3, 3) = "A"

though I suspect that character conversion isn't necessary.

Martin

> 
> Thanks!
> Daniel
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793