[BioC] a program to force fastq using pred+33 scoring system

Martin Morgan mtmorgan at fhcrc.org
Mon May 19 14:03:21 CEST 2014


On 05/14/2014 02:56 PM, Wang Peter wrote:
> i worte a program to automatically dectect the scoring system in fastq file
> and if pred+64, it should be changed into pred+33

I would make the relevant alphabets

   phred64 = rawToChar(as.raw(64:104))
   phred33 = rawToChar(as.raw(33:73))

translate the character representation of the underling BStringSet

   qual = chartr(phred64, phred33, quality(quality(reads)))

and tag the result as phred+33

   ShortReadQ(sread(reads), FastqQuality(qual))

To get a numeric representation of the quality encoding, I would

   as(quality(reads), "numeric")

or for a rectangular, NA-padded representation

   as(quality(reads), "matrix")

Martin

>
> please help me make sure it is right
>
> rm(list=ls())
> fastqfile="1.fastq";
> working_dir="d:/working_dir"
> setwd(working_dir)
> library(ShortRead);
> reads <- readFastq(fastqfile);
> seqs <- sread(reads);
> score_sys = data.class(quality(reads));
> cat("the quality score system
> (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n")
> raw_len <- max(width(reads));
>
> qual <- quality(quality(reads));
> myqual_16L <- charToRaw(as.character(unlist(qual)));
> if(score_sys =="FastqQuality")
> {
>   myqual_10L <- strtoi(myqual_16L,16L)-33;   # this is for other use
> }
>
> if(score_sys =="SFastqQuality")
> {
>   myqual_10L <- strtoi(myqual_16L,16L)-64;
>   qual_temp <- PhredQuality (as.integer(myqual_10L));
>   qual <- BStringSet(unlist(qual_temp), start= seq(from = 1, to =
> raw_len*(length(reads)-1)+1, by = raw_len), width=raw_len);
>
> }
> reads <- ShortReadQ(sread=seqs, quality=qual ) );
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list