[Bioc-devel] PhredQuality from Biostrings

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 15 19:51:10 CEST 2011


On 06/15/2011 10:28 AM, Christian Ruckert wrote:
> Am 10.06.2011 19:54, schrieb Martin Morgan:
>> On 06/10/2011 08:01 AM, Christian Ruckert wrote:
>>> Hi,
>>>
>>> I have written a function to read-in Roche SFF(Standard Flowgram Format)
>>> files into R. Now I want to store the contents in standard Bioconductor
>>> structures (e.q. sequences as DNAStringSet object). I have the quality
>>> scores as a list of integer vectors. One list entry for each sequence.
>>> The vector lengths correspond to the sequence lengths. The vectors
>>> contain entries between 0 and 40 corresponding to the base quality at
>>> this position.
>>>
>>
>> Hi Christian
>>
>> Maybe along the lines of
>>
>> PhredQuality(sapply(qualitylist, function(x) rawToChar(as.raw(x + 33))))
>
> This really speeds up things, thanks.
>
>> or via ShortRead::readQual / readFastaQual (can use a character vector
>> for the path; no need to create a RochePath). Probably you'll find
>> ShortReadQ useful for coordinating the sequences and qualities
>>
>> Martin
>
> I have successfully created a ShortReadQ object out of my sequences
>
>  > srq
> class: ShortReadQ
> length: 95551 reads; width: 77..1201 cycles
>
> Is it reasonable to use ShortReadQ for sequences from Roche with their
> differing lengths? It seems to work but the manual always states
> "uniform-length short reads".

yes it is fine; some operations (e.g., as(quality(srq), "matrix")) fail, 
as the qualities are not rectangular (I have sometimes found it 
convenient to look at trailing quality, then something like narrow(srq, 
start=width(rfq)-20) can be used).

Martin

>
> Christian


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-devel mailing list