[BioC] mismatch & replacement
Harris A. Jaffee
hj at jhu.edu
Fri Nov 5 23:46:03 CET 2010
This example illustrates another approach to the first question.
You'll need
to post-process using the width of the value if you need to delete or
select
the barcoded reads.
> trimLRPatterns(Lpattern="ACGT", subject=c("ACTTAA", "TTTTGG"),
max.Lmismatch=1)
[1] "AA" "TTTTGG"
> trimLRPatterns(Lpattern="ACGT", subject=c("ACTTAA", "TTTTGG"),
max.Lmismatch=1, ranges=TRUE)
IRanges of length 2
start end width
[1] 5 6 2
[2] 1 6 6
You can also use agrep with max.distance=1, but you will need to
narrow to the
barcode region of each read first (you can't employ "^" as a meta-
character).
-Harris
On Nov 5, 2010, at 4:54 PM, Daniel.Berner at unibas.ch wrote:
> Hi list
> 1. I have a large fastq file containing solexa reads that start
> with a barcode (identifier to separate individuals). I now want to
> filter that large data set according to the barcodes using
> ShortRead. I understand that this is easily done with grep() when
> one wants a perfect barcode match. However, I want to allow ONE
> single wrong nucleotide within the barcode, at any position. Is
> there an efficient way to filter by barcode while allowing a mismatch?
>
> 2. Is there a way to modify nucleotides in ShortRead objects? E.g.,
> to replace a G by an A at position 3 for ALL sequences in the object?
>
> Thanks!
> Daniel
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/
> gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list