[Bioc-sig-seq] perTile QA element in the FastqQA class

Martin Morgan mtmorgan at fhcrc.org
Tue Apr 13 00:26:28 CEST 2010


On 04/12/2010 02:49 PM, Sirisha Sunkara wrote:
> Hi Martin,
> 
> The qa function that reads in fastq format files, doesn't seem to
> populate the perTile QA element with row information...
> The row counts are zero for both the readCounts and
> medianReadQualityScore list elements of perTile.
> 
> Is this feature still work in progress..? Essentially, I am trying to
> get the TileQC plots for lanes where there was no reference genome to
> align (no export.txt files)

Hi Sirisha -- fastq files can't be guaranteed to have tile info so
ShortRead doesn't try to guess these, even if some software adopts
conventions for embedding the information in the read ids.

The tile images are generated by

  ShortRead:::.plotTileCounts

and

  ShortRead:::.plotTileQualityScore

both take a regular data.frame. For .plotTileCounts, the columns are
'type' (safe to ignore, I think), 'tile' (integer tile index), 'lane'
(integer lane index), and 'count' (number of reads in this particular
lane & tile). As an untested work-around, you could create a data frame
like this by parsing your read IDs using standard R commands; provide an
example of what the read IDs look like and I'll help you. For the
.plotTileQualityScore, the columns are 'type', 'tile', 'lane', and
'score', where 'score' is the median 'qualityScore'
(alphabetScore(quality(srq)) / width(quality(srq)) for some ShortReadQ
object srq obtained by readFastq) over all reads in the tile.

Martin

> 
>>  qafq <- qa("./Contam_Screening/Run703/","s_8_1_sequence.txt",
> type="fastq")
>> qafq
> class: FastqQA(9)
> QA elements (access with qa[["elt"]]):
>  readCounts: data.frame(1 3)
>  baseCalls: data.frame(1 5)
>  readQualityScore: data.frame(512 4)
>  baseQuality: data.frame(94 3)
>  alignQuality: data.frame(1 3)
>  frequentSequences: data.frame(50 4)
>  sequenceDistribution: data.frame(1663 4)
>  perCycle: list(2)
>    baseCall: data.frame(150 4)
>    quality: data.frame(1081 5)
>  perTile: list(2)
>    readCounts: data.frame(0 4)
>    medianReadQualityScore: data.frame(0 4)
> 
> Thank You,
> Sirisha
> 
>> sessionInfo()
> R version 2.11.0 Under development (unstable) (2010-03-07 r51225)
> x86_64-unknown-linux-gnu
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base   
> other attached packages:
> [1] ShortRead_1.5.21    lattice_0.18-3      Biostrings_2.15.22
> [4] GenomicRanges_0.1.0 IRanges_1.5.74      Rmpi_0.5-8       
> loaded via a namespace (and not attached):
> [1] Biobase_2.7.5 grid_2.11.0   hwriter_1.2   tools_2.11.0
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-sig-sequencing mailing list