[BioC] how does ShortRead function identify score system

Martin Morgan mtmorgan at fhcrc.org
Thu Sep 20 05:27:07 CEST 2012


On 09/19/2012 07:57 PM, wang peter wrote:
> dear ALL:
>         i have many RNA-seq data.
> some use Phred+33 system to score the quality
> some use Phred+63
>
> i found no parameter to tell the functions which one should be.
> but it still works
> can you tell me which function can identify them.

On the help page ?readFastq it says

      ...: Additional arguments. In particular, 'qualityType' and
           'filter':

           qualityType: Representation to be used for quality scores,
               must be one of 'Auto' (choose Phred-like if any character
               is ASCII-encoded as less than 59) 'FastqQuality'
               (Phred-like encoding), 'SFastqQuality' (Illumina
               encoding).

so readFastq(fastqfile, qualityType="FastqQuality") over-rides the 
'Auto' selection.

In more detail this is implemented in the 'ShortReadQ' method for 
"DNAStringSet", "BStringSet", "BStringSet", below, where the heuristic 
is just a scan of the first 1000 reads for quality scores below 59

                 alf <- alphabetFrequency(head(quality, 1000),
                   collapse = TRUE)
                 if (any(alf) && min(which(alf != 0)) < 59) {
                   FastqQuality
                 } else SFastqQuality

The choice is only between FastqQuality and SFastqQuality, to create 
PhredQuality one has to do what you do below.

Martin

 > selectMethod("ShortReadQ",
+     c(sread="DNAStringSet", quality="BStringSet", id="BStringSet"))
Method Definition:

function (sread, quality, id, ...)
{
     .local <- function (sread, quality, id, ..., qualityType = c("Auto",
         "FastqQuality", "SFastqQuality"), filter = srFilter(),
         withIds = TRUE)
     {
         if (!missing(filter))
             .check_type_and_length(filter, "SRFilter", NA)
         tryCatch({
             qualityType <- match.arg(qualityType)
         }, error = function(err) {
             .throw(SRError("UserArgumentMismatch", conditionMessage(err)))
         })
         tryCatch({
             qualityFunc <- switch(qualityType, Auto = {
                 alf <- alphabetFrequency(head(quality, 1000),
                   collapse = TRUE)
                 if (any(alf) && min(which(alf != 0)) < 59) {
                   FastqQuality
                 } else SFastqQuality
             }, SFastqQuality = SFastqQuality, FastqQuality = FastqQuality)
             quality <- qualityFunc(quality)
             srq <- if (withIds)
                 ShortReadQ(sread, quality, id)
             else ShortReadQ(sread, quality)
             if (!missing(filter))
                 srq <- srq[filter(srq)]
             srq
         }, error = function(err) {
             .throw(SRError("IncompatibleTypes", "message: %s",
                 conditionMessage(err)))
         })
     }
     .local(sread, quality, id, ...)
}
<environment: namespace:ShortRead>

Signatures:
         sread          quality      id
target  "DNAStringSet" "BStringSet" "BStringSet"
defined "DNAStringSet" "BStringSet" "BStringSet"

>
> such is the coding:
> reads <- readFastq(fastqfile);
>
> qual <- FastqQuality(quality(quality(reads)));
> #qual <- PhredQuality(quality(quality(reads)));#
>
> readM <- as(qual, "matrix");#注意R's maximum vector length is 2^31 - 1
> pdf(file="boxplot.pdf"); # Save box plot as boxplot.pdf in current folder
> boxplot(as.data.frame((readM)), outline = FALSE, main="Per Cycle Read
> Quality", xlab="Cycle", ylab="Phred Quality");
> # build per cycle boxplot
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list