[Bioc-sig-seq] Problem with ShortRead reading quality scores from bowtie
Martin Morgan
mtmorgan at fhcrc.org
Tue Aug 4 18:07:51 CEST 2009
Martin Morgan wrote:
> Fuad Gwadry wrote:
>> Hi All
>>
>>
>>
>> I am getting negative values when reading quality scores when I read
>> data generated in bowtie. Has anyone run into the same issue when
>> using data generated by bowtie ? My session info is below.
>
> Hi Fuad -- ShortRead is reading the quality scores on the wrong scale
> (solexa, rather than phred; this will be fixed before the next release).
> Try
>
> qual <- FastqQuality(quality(quality(aln))
> initialize(aln, quality=qual)
>
> to update aln, or
>
> m <- as(FastqQuality(quality(quality(aln)), "matrix")
>
> for a one-off solution.
I wanted to clarify, too, both for this post and one yesterday, that
as() is simply converting the character encoding to the corresponding
integer value that each letter encodes; there is a secondary mapping
from this encoding to log-odds or phred score that is not being
performed. This step is, I think
10^(-m/10) for phred scores
1 - 1 / (1 + 10^(-m/10)) for Solexa scores
Solexa has changed its encoding scheme very recently; I think it is now
standard phred but am not sure.
Martin
>
> Martin
>
>>
>>
>>
>> Thanks in advance
>>
>>
>>
>> Fuad
>>
>>
>>
>>> aln
>> class: AlignedRead
>> length: 4591807 reads; width: 32 cycles
>> chromosome: chr13 chr7 ... chr6 chr4 position: 93437004 13223395 ...
>> 23636747 23353864 strand: - - ... + + alignQuality: NumericQuality
>> alignData varLabels: similar mismatch
>>
>>> m <- as(quality(aln), "matrix")
>>> colMeans(m)
>> [1] -7.186638 -7.205858 -7.203382 -7.197175 -7.203629 -7.217016
>> [7] -7.240661 -7.249238 -7.268499 -7.286551 -7.306615 -7.324003
>> [13] -7.523238 -7.581242 -7.697591 -7.695861 -7.735321 -7.743323
>> [19] -7.752996 -7.849403 -7.862658 -7.931969 -7.979778 -8.029288
>> [25] -8.120469 -8.215818 -8.335176 -8.411609 -8.587005 -8.820979
>> [31] -11.447326 -11.644198
>>
>>
>>> sessionInfo()
>> R version 2.9.1 (2009-06-26) x86_64-unknown-linux-gnu
>> locale:
>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>> other attached packages:
>> [1] ShortRead_1.3.22 lattice_0.17-25 BSgenome_1.13.10
>> Biostrings_2.13.29
>> [5] IRanges_1.3.44
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1
>>
>> _________________________________________________________________
>> More storage. Better anti-spam and antivirus protection. Hotmail makes
>> it simple.
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list