# [Bioc-devel] phred qualities

Thomas Girke thomas.girke at ucr.edu
Wed Jun 27 22:14:17 CEST 2012

```I am not sure if this will help achieving what you are trying to do, but when I
am dealing with reads that are not of constant length and I want to perform
efficient matrix-based computations on their phred scores or base calls then I
inject them into a matrix pre-populated with N/NA values and of dimensions
such a matrix is relatively efficient since one only needs to iterate
over the number distinct length values observed in the sequence set.

then one can obtain the quality (q) and base (s) matrices like this:

mymin <- min(width(fq))
mymax <- max(width(fq))
s <- matrix("N", length(fq), mymax)
q <- matrix(NA, length(fq), mymax) # Or use 0 instead of NAs
for(i in mymin:mymax) {
index <- width(fq)==i
if(any(index)) {
s[index, 1:i] <- as.matrix(DNAStringSet(sread(fq)[index], start=1, end=i))
q[index, 1:i] <- as.matrix(PhredQuality(quality(fq))[index])
}
}

When it comes to turning q and s back to XStringSet objects then things are
not so efficient anymore. Perhaps there is an elegant/fast option to coerce a
matrix of Phred scores back to a BStringSet object that I am not aware of?

Thomas

On Wed, Jun 27, 2012 at 06:26:25PM +0000, Martin Morgan wrote:
> On 06/27/2012 11:22 AM, Martin Morgan wrote:
> > On 06/27/2012 08:02 AM, Kasper Daniel Hansen wrote:
> >> Phred qualities are usually presented as ascii-encode numbers with an
> >> offset of either 32 or 64. Some packages returns this as a
> >> BStringSet. I can convert a character vector "charvec" to a list of
> >> integers using code like
> >> sapply(charvec, function(xx) charToRaw(xx) - 33L)
> >>
> >> Do we have fast(er) ways of doing this, when charvec is really long
> >> and not necessarily with the same number of chars in each string? I
> >> am thinking of implementing the sapply() above in C (directly
> >> vectorizing it), but surely someone has done something like that
> >> somewhere.
> >
> > I think you get this with XStringSet, e.g., PhredQuality, with
> >
> > x = PhredQuality(c("HH", "III"))
> > y = as.numeric(unlist(x)) - 33L
>
>    as.integer
>
> > z = relist(y, x)
>
> or for a simple list
>
>    split(y, rep(seq_along(x), elementLengths(x))
>
> I have a recollection that there is something built-in...
>
> Martin
>
> >
> > Martin
> >
> >>
> >> Kasper
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

```