[BioC] a program to force fastq using pred+33 scoring system
Martin Morgan
mtmorgan at fhcrc.org
Mon May 19 14:03:21 CEST 2014
On 05/14/2014 02:56 PM, Wang Peter wrote:
> i worte a program to automatically dectect the scoring system in fastq file
> and if pred+64, it should be changed into pred+33
I would make the relevant alphabets
phred64 = rawToChar(as.raw(64:104))
phred33 = rawToChar(as.raw(33:73))
translate the character representation of the underling BStringSet
qual = chartr(phred64, phred33, quality(quality(reads)))
and tag the result as phred+33
ShortReadQ(sread(reads), FastqQuality(qual))
To get a numeric representation of the quality encoding, I would
as(quality(reads), "numeric")
or for a rectangular, NA-padded representation
as(quality(reads), "matrix")
Martin
>
> please help me make sure it is right
>
> rm(list=ls())
> fastqfile="1.fastq";
> working_dir="d:/working_dir"
> setwd(working_dir)
> library(ShortRead);
> reads <- readFastq(fastqfile);
> seqs <- sread(reads);
> score_sys = data.class(quality(reads));
> cat("the quality score system
> (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n")
> raw_len <- max(width(reads));
>
> qual <- quality(quality(reads));
> myqual_16L <- charToRaw(as.character(unlist(qual)));
> if(score_sys =="FastqQuality")
> {
> myqual_10L <- strtoi(myqual_16L,16L)-33; # this is for other use
> }
>
> if(score_sys =="SFastqQuality")
> {
> myqual_10L <- strtoi(myqual_16L,16L)-64;
> qual_temp <- PhredQuality (as.integer(myqual_10L));
> qual <- BStringSet(unlist(qual_temp), start= seq(from = 1, to =
> raw_len*(length(reads)-1)+1, by = raw_len), width=raw_len);
>
> }
> reads <- ShortReadQ(sread=seqs, quality=qual ) );
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list