[BioC] how does ShortRead function identify score system
Martin Morgan
mtmorgan at fhcrc.org
Thu Sep 20 05:27:07 CEST 2012
On 09/19/2012 07:57 PM, wang peter wrote:
> dear ALL:
> i have many RNA-seq data.
> some use Phred+33 system to score the quality
> some use Phred+63
>
> i found no parameter to tell the functions which one should be.
> but it still works
> can you tell me which function can identify them.
On the help page ?readFastq it says
...: Additional arguments. In particular, 'qualityType' and
'filter':
qualityType: Representation to be used for quality scores,
must be one of 'Auto' (choose Phred-like if any character
is ASCII-encoded as less than 59) 'FastqQuality'
(Phred-like encoding), 'SFastqQuality' (Illumina
encoding).
so readFastq(fastqfile, qualityType="FastqQuality") over-rides the
'Auto' selection.
In more detail this is implemented in the 'ShortReadQ' method for
"DNAStringSet", "BStringSet", "BStringSet", below, where the heuristic
is just a scan of the first 1000 reads for quality scores below 59
alf <- alphabetFrequency(head(quality, 1000),
collapse = TRUE)
if (any(alf) && min(which(alf != 0)) < 59) {
FastqQuality
} else SFastqQuality
The choice is only between FastqQuality and SFastqQuality, to create
PhredQuality one has to do what you do below.
Martin
> selectMethod("ShortReadQ",
+ c(sread="DNAStringSet", quality="BStringSet", id="BStringSet"))
Method Definition:
function (sread, quality, id, ...)
{
.local <- function (sread, quality, id, ..., qualityType = c("Auto",
"FastqQuality", "SFastqQuality"), filter = srFilter(),
withIds = TRUE)
{
if (!missing(filter))
.check_type_and_length(filter, "SRFilter", NA)
tryCatch({
qualityType <- match.arg(qualityType)
}, error = function(err) {
.throw(SRError("UserArgumentMismatch", conditionMessage(err)))
})
tryCatch({
qualityFunc <- switch(qualityType, Auto = {
alf <- alphabetFrequency(head(quality, 1000),
collapse = TRUE)
if (any(alf) && min(which(alf != 0)) < 59) {
FastqQuality
} else SFastqQuality
}, SFastqQuality = SFastqQuality, FastqQuality = FastqQuality)
quality <- qualityFunc(quality)
srq <- if (withIds)
ShortReadQ(sread, quality, id)
else ShortReadQ(sread, quality)
if (!missing(filter))
srq <- srq[filter(srq)]
srq
}, error = function(err) {
.throw(SRError("IncompatibleTypes", "message: %s",
conditionMessage(err)))
})
}
.local(sread, quality, id, ...)
}
<environment: namespace:ShortRead>
Signatures:
sread quality id
target "DNAStringSet" "BStringSet" "BStringSet"
defined "DNAStringSet" "BStringSet" "BStringSet"
>
> such is the coding:
> reads <- readFastq(fastqfile);
>
> qual <- FastqQuality(quality(quality(reads)));
> #qual <- PhredQuality(quality(quality(reads)));#
>
> readM <- as(qual, "matrix");#注意R's maximum vector length is 2^31 - 1
> pdf(file="boxplot.pdf"); # Save box plot as boxplot.pdf in current folder
> boxplot(as.data.frame((readM)), outline = FALSE, main="Per Cycle Read
> Quality", xlab="Cycle", ylab="Phred Quality");
> # build per cycle boxplot
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list