[BioC] a possible bug in the shortread packge
Martin Morgan
mtmorgan at fhcrc.org
Wed May 14 22:03:47 CEST 2014
On 05/14/2014 01:01 PM, Wang Peter wrote:
> thank you very much
> i think this method is not reliable
> if the data is high quality, no nt is low , like 59.
> they will be thought as SFastqQuality.
>
> i would u like to see if some score is higher than 80, then choose SFastqQuality.
why 80? maybe better to force explicit choice. Is there a better standard than
http://en.wikipedia.org/wiki/FASTQ_format
>
>
> On Thu, May 15, 2014 at 3:56 AM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
> On 05/14/2014 11:17 AM, Wang Peter wrote:
>
> the coding can works well on many data.
> but when it works on 12 lines, i met such a problem
>
> how can the function tell the score if 33 or 64 system?
>
> library(ShortRead);
> reads <- readFastq(fastqfile);
> seqs <- sread(reads);
> score_sys = data.class(quality(reads));
> cat("the quality score system
> (SFastqQuality=Phred+64,__FastqQuality=Phred+33) is",score_sys,"\n")
>
>
> the output is:
> the quality score system (SFastqQuality=Phred+64,__FastqQuality=Phred+33) is
> SFastqQuality
> but it is really the FastqQuality=Phred+33
>
> @HISEQ04:126:C343UACXX:8:1103:__15851:74641 1:N:0:ACAGTG
> GGCCTCTCAATGTCAAGGGATCGACGGCAG__ATATCATAGATGGCCTCATTGTCCAAGAGA__ACTGCGACATCTGTGTGCTCGAGCAAGGAA__TGAGTGGAAAG
> +
> BBBFFFFFFFFFFFFIIIIIIIIIIIIIII__FIIIIIIIIIIIIIIIIIIIIIIIIIIFFF__FFFFFFBFFFFBFFFFFFFFFFFFFFFFFF__FFFFBFFFBFB
> @HISEQ04:126:C343UACXX:8:1103:__16187:74529 1:N:0:ACAGTG
> CAATTCTAGCTACTGGAGCTGTCCATTTGC__CGCGCAGGCACTGAAGATAGAACATCGATC__GAGTCAACCTCTACCTGCATTAGGTGACTG__CTGAGAGCTCC
> +
> BBBFFFFFFFFFFIIIIIIIIIIIIIIIII__IIIIIIIIIIIIIIIIIIIIIIIIIIFFFF__FFFBFFFFFFFFFFFFFBFFFFFFBFFFFF__FFFFFFFFFFF
> @HISEQ04:126:C343UACXX:8:1103:__16244:74553 1:N:0:ACAGTG
> GCCGAAGCATTTTTGGCTTCTGTAAGGTTG__TACATATGAAGCAGATTGCTCCAGCTTGGA__AGAGTCATGTTTGTGACGAGAGAACTGGCT__ACAGCTCCAGG
> +
> BBBFFFFFFFFFFIIIIIIIIIIIIIIFFI__FIIIIIIFIIIIIIIIIIIIIIIIIIIIII__IIIIFFFIIFFFFFFFFFFFFFFFBFFFFF__FFFFFFFFFFF
>
>
>
> From the help page
>
> ?readFastq
>
> the 'qualityType' argument is described as
>
> qualityType: Representation to be used for quality scores,
> must be one of 'Auto' (choose Phred-like if any character
> is ASCII-encoded as less than 59) 'FastqQuality'
> (Phred-like encoding), 'SFastqQuality' (Illumina
> encoding).
>
> 'Auto' is the default, none of the ASCII-encoded quality characters is less
> than 59, hence choose SFastqQuality.
>
> Invoke the command with the information about encoding if known,
>
> readFastq(fastqfile, qualityType="FastqQuality")
>
> See this previous post
>
> https://stat.ethz.ch/__pipermail/bioconductor/2012-__September/048172.html
> <https://stat.ethz.ch/pipermail/bioconductor/2012-September/048172.html>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
>
>
> --
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute for Plant Research
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu <mailto:email%3Asg839 at cornell.edu>
> Facebook:http://www.facebook.com/profile.php?id=100001986532253
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list