[Bioc-sig-seq] apply/sapply for DNAStringSet class or FastqQuality class?

Martin Morgan mtmorgan at fhcrc.org
Mon Feb 22 15:35:40 CET 2010


On 02/22/2010 05:57 AM, Martin Morgan wrote:
> On 02/22/2010 02:36 AM, Johannes Rainer wrote:
>> hi,
>>
>> I'm just wondering of there is a "apply" function for the classes defined in
>> the ShortRead package.
>> Or is there a nice and fast way to e.g. calculate the mean Phred quality for
>> each aligned sequence? note, I can not use the as( quality( AlignedReads ),
>> "matrix" ), because the lengths of the alignments (i.e. aligned sequences)
>> differs in my data set.
> 
> An approximate way is ?alphabetScore, which calculates the sum of the
> _encodings_ (not quite quality scores) of each read. Another way is
> 
>   char2phred <- function(p) -10*log10(as.integer(charToRaw(p)))

To be a little more explicit 'char2phred' needs to be defined to
correctly include the encoding (what integer values each letter
represents) and what the quality score actually is (Phred above, but
`Solexa' quality a possibility); see
http://en.wikipedia.org/wiki/FASTQ_format.

Martin

>   phred = lapply(as.character(quality(quality(aln))), char2phred)
> 
> One might also try
> 
>   lapply(quality(quality(aln)), char2phred)
> 
> which is more memory efficient but slow.
> 
> Martin
> 
>>
>> thanks!
>>
>> cheers, jo
>>
> 
> 


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list