[Bioc-devel] phred qualities

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed Jun 27 20:46:46 CEST 2012


On Wed, Jun 27, 2012 at 2:26 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 06/27/2012 11:22 AM, Martin Morgan wrote:
>>
>> On 06/27/2012 08:02 AM, Kasper Daniel Hansen wrote:
>>>
>>> Phred qualities are usually presented as ascii-encode numbers with an
>>> offset of either 32 or 64. Some packages returns this as a
>>> BStringSet. I can convert a character vector "charvec" to a list of
>>> integers using code like
>>> sapply(charvec, function(xx) charToRaw(xx) - 33L)
>>>
>>> Do we have fast(er) ways of doing this, when charvec is really long
>>> and not necessarily with the same number of chars in each string? I
>>> am thinking of implementing the sapply() above in C (directly
>>> vectorizing it), but surely someone has done something like that
>>> somewhere.
>>
>>
>> I think you get this with XStringSet, e.g., PhredQuality, with
>>
>> x = PhredQuality(c("HH", "III"))
>> y = as.numeric(unlist(x)) - 33L
>
>
>  as.integer
>
>> z = relist(y, x)
>
>
> or for a simple list
>
>  split(y, rep(seq_along(x), elementLengths(x))
>
> I have a recollection that there is something built-in...

Would also be nice if the as.integer(unlist(x)) knew that x is a
PhredQuality and therefore knew to subtract 33.    From the
PhredQuality docs it seems that this has already happened in the
underlying raw vector, and when you do unlist(x) it converts it back
into a BString.

....

Looking in Biostrings there is .XStringQualityToIntegerMatrix which is used in
  as(x, "matrix")
which does what I want, but assumes that all strings have equal width.

So I guess I should write something like an as(x, "list") method,
which I can do using x at ranges.  But would that conflict with the
unlist(x) command above.  Or should it have another name?

Kasper



>
> Martin
>
>
>>
>> Martin
>>
>>>
>>> Kasper
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>
>
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793



More information about the Bioc-devel mailing list