[BioC] a possible bug in the shortread packge
Martin Morgan
mtmorgan at fhcrc.org
Wed May 14 21:56:05 CEST 2014
On 05/14/2014 11:17 AM, Wang Peter wrote:
> the coding can works well on many data.
> but when it works on 12 lines, i met such a problem
>
> how can the function tell the score if 33 or 64 system?
>
> library(ShortRead);
> reads <- readFastq(fastqfile);
> seqs <- sread(reads);
> score_sys = data.class(quality(reads));
> cat("the quality score system
> (SFastqQuality=Phred+64,FastqQuality=Phred+33) is",score_sys,"\n")
>
>
> the output is:
> the quality score system (SFastqQuality=Phred+64,FastqQuality=Phred+33) is
> SFastqQuality
> but it is really the FastqQuality=Phred+33
>
> @HISEQ04:126:C343UACXX:8:1103:15851:74641 1:N:0:ACAGTG
> GGCCTCTCAATGTCAAGGGATCGACGGCAGATATCATAGATGGCCTCATTGTCCAAGAGAACTGCGACATCTGTGTGCTCGAGCAAGGAATGAGTGGAAAG
> +
> BBBFFFFFFFFFFFFIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFFFBFFFFBFFFFFFFFFFFFFFFFFFFFFFBFFFBFB
> @HISEQ04:126:C343UACXX:8:1103:16187:74529 1:N:0:ACAGTG
> CAATTCTAGCTACTGGAGCTGTCCATTTGCCGCGCAGGCACTGAAGATAGAACATCGATCGAGTCAACCTCTACCTGCATTAGGTGACTGCTGAGAGCTCC
> +
> BBBFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFFBFFFFFFFFFFFFFBFFFFFFBFFFFFFFFFFFFFFFF
> @HISEQ04:126:C343UACXX:8:1103:16244:74553 1:N:0:ACAGTG
> GCCGAAGCATTTTTGGCTTCTGTAAGGTTGTACATATGAAGCAGATTGCTCCAGCTTGGAAGAGTCATGTTTGTGACGAGAGAACTGGCTACAGCTCCAGG
> +
> BBBFFFFFFFFFFIIIIIIIIIIIIIIFFIFIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIFFFIIFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFF
>
>
From the help page
?readFastq
the 'qualityType' argument is described as
qualityType: Representation to be used for quality scores,
must be one of 'Auto' (choose Phred-like if any character
is ASCII-encoded as less than 59) 'FastqQuality'
(Phred-like encoding), 'SFastqQuality' (Illumina
encoding).
'Auto' is the default, none of the ASCII-encoded quality characters is less than
59, hence choose SFastqQuality.
Invoke the command with the information about encoding if known,
readFastq(fastqfile, qualityType="FastqQuality")
See this previous post
https://stat.ethz.ch/pipermail/bioconductor/2012-September/048172.html
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list