[Bioc-devel] PhredQuality from Biostrings
Christian Ruckert
cruckert at uni-muenster.de
Fri Jun 10 17:01:59 CEST 2011
Hi,
I have written a function to read-in Roche SFF(Standard Flowgram Format)
files into R. Now I want to store the contents in standard Bioconductor
structures (e.q. sequences as DNAStringSet object). I have the quality
scores as a list of integer vectors. One list entry for each sequence.
The vector lengths correspond to the sequence lengths. The vectors
contain entries between 0 and 40 corresponding to the base quality at
this position.
Here is an example for one list entry, a sequence of length 82:
> qualitylist[[1]]
[1] 40 40 40 40 40 40 40 40 40 40 40 40 36 24 16 16 16 27 27 36 20 20
27 27 31
[26] 27 36 38 39 40 40 40 40 40 40 40 40 40 40 40 40 40 39 34 34 38 39
40 40 40
[51] 40 40 40 40 40 40 40 40 40 40 40 40 30 20 20 20 36 40 40 40 40 30
30 30 30
[76] 39 40 40 40 40 40 40
Now I'm looking for an elegant way to convert my list of integer vectors
to an PhredQuality object, but the solution I found is very slow for a
list with 90000 sequences and a mean sequence length of around 400.
> pq = PhredQuality(sapply(qualitylist, function(x)
toString(PhredQuality(x))))
Is there a faster way creating a PhredQuality object out of a list like
mine.
Regards,
Christian
> sessionInfo()
R version 2.14.0 Under development (unstable) (2011-05-17 r55946)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] R453Plus1Toolbox_1.3.1
[2] BSgenome.Scerevisiae.UCSC.sacCer2_1.3.17
[3] BSgenome_1.21.0
[4] GenomicRanges_1.5.7
[5] Biostrings_2.21.3
[6] IRanges_1.11.5
[7] Biobase_2.13.2
loaded via a namespace (and not attached):
[1] biomaRt_2.9.1 hwriter_1.3 R2HTML_2.2 RCurl_1.6-1
[5] Rsamtools_1.5.17 ShortRead_1.11.6 tools_2.14.0 XML_3.4-0
More information about the Bioc-devel
mailing list