[BioC] Phred encoding
Martin Morgan
mtmorgan at fhcrc.org
Thu Aug 15 19:05:17 CEST 2013
On 08/15/2013 09:49 AM, Taylor, Sean D wrote:
> Thanks for the response Vincent. I'm afraid I still don't understand what as.raw() is doing differently. Did part of your reply get cut off? After consulting ?as.raw() I still would have expected the answer that as.numeric() is generating (based on my limited understanding).
>
I'd be interested in knowing what your objective is -- drop reads with some low
quality calls? drop reads with overall low quality? trim reads of low quality
heads / tails?
Anway, 'unlist' strips the 'Quality' class
> unlist(qual)
10-letter "BString" instance
seq: BBBBBFFB4!
so any operations are based on 'BString' without reference to encoding.
as.integer / as.numeric then return the ascii symbol
(http://www.asciitable.com/) of the corresponding letter
> as.integer(unlist(qual))
[1] 66 66 66 66 66 70 70 66 52 33
> as.numeric(unlist(qual))
[1] 66 66 66 66 66 70 70 66 52 33
as.raw on a BString returns the raw (hexadecimal) representation of the ascii
encoding
> selectMethod(as.raw, "BString")
Method Definition:
function (x)
as.raw(as.integer(x))
<environment: namespace:XVector>
Signatures:
x
target "BString"
defined "XRaw"
> as.raw(1:100)
[1] 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19
[26] 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32
[51] 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 42 43 44 45 46 47 48 49 4a 4b
[76] 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 62 63 64
> as.raw(66)
[1] 42
From ?PhredQuality and rectangular data perhaps you'd like
rowSums(as(qual, "matrix") < 23) == 0
or for irregular data
rowSums(as(FastqQuality(qual), "matrix") < 23, na.rm=TRUE) == 0
also ?trimTails in ShortRead might be relevant.
Martin
> From: Vincent Carey [mailto:stvjc at channing.harvard.edu]
> Sent: Wednesday, August 14, 2013 6:32 PM
> To: Taylor, Sean D
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Phred encoding
>
> from ?as.raw
>
> A raw vector is printed with each byte separately represented as a
> pair of hex digits. If you want to see a character representation
>
>
> On Wed, Aug 14, 2013 at 6:25 PM, Taylor, Sean D <sdtaylor at fhcrc.org<mailto:sdtaylor at fhcrc.org>> wrote:
> Hi,
>
> I'm trying to quality filter my NGS reads and want to filter out reads that have bases below a quality threshold (say 23 for instance, using Illumina MiSeq with an offset of 33). Can anyone tell me why the results of the two functions as.raw() and as.numeric() give different results?
>
> qual<-PhredQuality(c("BBBBBFFB4!"))
> as.raw(unlist(qual))
> as.numeric(unlist(qual))
>
> Thanks,
> Sean
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list