[Bioc-sig-seq] Reading quality scores from Bowtie (using Piple
Martin Morgan
mtmorgan at fhcrc.org
Wed Aug 5 18:25:02 CEST 2009
Fuad Gwadry wrote:
> Hi Martin
>
> I followed the same steps using the data from a newer version of the
> illumina pipeline (ver 1.3). Still run into the same issues.
>
> Fuad
>
> ##########################
> original data illumina 1.3
> #########################
> > alignQuality(aln)
> class: NumericQuality
> quality: NA NA ... NA NA (4585850 total)
> > head(quality(aln))
> class: SFastqQuality
> quality:
> A BStringSet instance of length 6
> width seq
> [1] 35 ;;;;;;;;;;;;;;;;;;;;;;;;;;:;;;57547
> [2] 35 ;:7;;;;;;7;;;;4;;;;;:;:7;77;;;5+444
> [3] 35 ;;:;;;:::;;;:;:;;9:6;;;9:;;:;972479
> [4] 35 ;;6;;;;;;;;;;:;;;;;;9&9;;;;9;;99587
> [5] 35 ;;;;:9;;;;:;;79;;;/::;66;)7;4/78487
> [6] 35 ;-;;;;;;;;;;;;;;;;;8;;;;;;;;;055-75
> > m <- as(quality(aln), "matrix")
> > colMeans(m)
> [1] -6.044744 -5.936636 -5.924813 -5.958905 -5.956505 -5.921455
> [7] -5.929669 -5.967875 -5.929279 -5.970600 -5.975925 -6.011533
> [13] -6.462944 -6.520501 -6.601669 -6.651007 -6.805869 -6.832879
> [19] -6.874413 -7.030044 -7.153979 -7.210835 -7.278909 -7.363969
> [25] -7.475453 -7.662229 -7.781669 -7.915972 -8.113778 -8.222790
> [31] -12.165080 -12.329542 -12.456710 -12.590711 -12.737429
>
> ##########################
> incorporated your suggestions
> ##########################
>
> > qual <- FastqQuality(quality(quality(aln)))
> > qual
> class: FastqQuality
> quality:
> A BStringSet instance of length 4585850
> width seq
> [1] 35 ;;;;;;;;;;;;;;;;;;;;;;;;;;:;;;57547
> [2] 35 ;:7;;;;;;7;;;;4;;;;;:;:7;77;;;5+444
> [3] 35 ;;:;;;:::;;;:;:;;9:6;;;9:;;:;972479
> [4] 35 ;;6;;;;;;;;;;:;;;;;;9&9;;;;9;;99587
> [5] 35 ;;;;:9;;;;:;;79;;;/::;66;)7;4/78487
> [6] 35 ;-;;;;;;;;;;;;;;;;;8;;;;;;;;;055-75
> [7] 35 ;:9;;;;;;;;;;;;;:;:;;;;:;;;;+:54455
> [8] 35 ;;;;;;:;;;:;;;;;:;;3:9;;;;:;;;45544
> [9] 35 ;;:;;;;;;;4;;;4;:7;:;;;74;;;;644015
> ... ... ...
> [4585842] 35 0:;7;;-;;;;;:;;7)-.25;;&;;9;7;.4444
> [4585843] 35 7477:.-;9*;3;0979489.-9::&;49:5-344
> [4585844] 35 6:;::;;;;;:;:;;6::98:3962:949:24424
> [4585845] 35 0:;;:::6;::.:::626:277:8)9::7:)5--0
> [4585846] 35 4;7+9-;;;;6:;);9;+2037:&0-270&4.-1.
> [4585847] 35 :;;::;;;;;;;0;:96.:9:9;2;);99;54544
> [4585848] 35 ;;:9;:;;;;:;:7,0)9;);))9.;6:4;4245-
> [4585849] 35 8:;;;;;:668:;2;:,836:;69:993::454,0
> [4585850] 35 889;:;;;;;;-;-.89);;;);;4044*7*4.41
> > initialize(aln, quality=qual)
> class: AlignedRead
> length: 4585850 reads; width: 35 cycles
> chromosome: shank3genomic shank3genomic ... shank3genomic shank3genomic
> position: 37625 19915 ... 13278 39832
> strand: + + ... + +
> alignQuality: NumericQuality
> alignData varLabels: similar mismatch
>
> > m <- as(quality(aln), "matrix")
> > colMeans(m)
> [1] -6.044744 -5.936636 -5.924813 -5.958905 -5.956505 -5.921455
> [7] -5.929669 -5.967875 -5.929279 -5.970600 -5.975925 -6.011533
> [13] -6.462944 -6.520501 -6.601669 -6.651007 -6.805869 -6.832879
> [19] -6.874413 -7.030044 -7.153979 -7.210835 -7.278909 -7.363969
> [25] -7.475453 -7.662229 -7.781669 -7.915972 -8.113778 -8.222790
> [31] -12.165080 -12.329542 -12.456710 -12.590711 -12.737429
>
> > alignQuality(aln)
> class: NumericQuality
> quality: NA NA ... NA NA (4585850 total)
>
> ###############################
> Also tried your second suggestion and ended up with teh following error
> ##############################
>
> > m <- as(FastqQuality(quality(quality(aln)), "matrix"))
Hi Faud --
here the problem is a mis-placed parenthesis -- check out "matrix" in
as(FastqQuality(quality(quality(aln))), "matrix")
I see that I was missing a ) in my original reply, so sorry about that.
Martin
> Error in .identC(Class, "double") :
> element 2 is empty;
> the part of the args list of '.Call' being evaluated was:
> ("R_identC", c1, c2, PACKAGE = "methods")
>
>
>
> > sessionInfo()
> R version 2.9.1 (2009-06-26)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.3.22 lattice_0.17-25 BSgenome_1.13.10
> Biostrings_2.13.29
> [5] IRanges_1.3.44
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1
>
>
>
> ------------------------------------------------------------------------
> Attention all humans. We are your photos. Free us.
> <http://go.microsoft.com/?linkid=9666046>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-sig-sequencing
mailing list