[Bioc-sig-seq] perTile QA element in the FastqQA class

Tue Apr 13 20:20:00 CEST 2010

Hi Martin,

I am using the sequence.txt files generated by the Illumina pipeline 
(OLB1.6/RTA1.6) as is, which seem to have the tile coordinates.

Just so I can focus on the ReadIDs part for now (and I am sure this is 
not exactly what you asked for), I parsed out the readIDs from the 
fastq, and am working with those.

This is what my fastqs look like:

@ILLUMINA06:8:1:6:849#0/1
GCTCTTTTTGATTCTCAAATCCGGCGTCAACCATA
+ILLUMINA06:8:1:6:849#0/1
a`abaa_aaa]_a`_a_[]`]a_`aa_`_aa`aaa
@ILLUMINA06:8:1:6:1169#0/1
TAATGCCACTCCTCTCCCGACTGTTAACACTGCTG
+ILLUMINA06:8:1:6:1169#0/1
ab`_Z_aXa`bbababbbaabaaaaababaaa`V`

My very basic attempt at this:

 > fqhead <- 
read.table("./Contam_Screening/Run703/sequence_8_1_hdrs.txt", sep=":")

To extract all entries for instance in lane 8, tile 120:
 > fqhead[fqhead$V2 == "3" & fq$V3 == "120",]

I hope I am somewhat closer to what you asked for...

Thanks a lot!
Sirisha

Martin Morgan wrote:
> On 04/12/2010 02:49 PM, Sirisha Sunkara wrote:
>   
>> Hi Martin,
>>
>> The qa function that reads in fastq format files, doesn't seem to
>> populate the perTile QA element with row information...
>> The row counts are zero for both the readCounts and
>> medianReadQualityScore list elements of perTile.
>>
>> Is this feature still work in progress..? Essentially, I am trying to
>> get the TileQC plots for lanes where there was no reference genome to
>> align (no export.txt files)
>>     
>
> Hi Sirisha -- fastq files can't be guaranteed to have tile info so
> ShortRead doesn't try to guess these, even if some software adopts
> conventions for embedding the information in the read ids.
>
> The tile images are generated by
>
>   ShortRead:::.plotTileCounts
>
> and
>
>   ShortRead:::.plotTileQualityScore
>
> both take a regular data.frame. For .plotTileCounts, the columns are
> 'type' (safe to ignore, I think), 'tile' (integer tile index), 'lane'
> (integer lane index), and 'count' (number of reads in this particular
> lane & tile). As an untested work-around, you could create a data frame
> like this by parsing your read IDs using standard R commands; provide an
> example of what the read IDs look like and I'll help you. For the
> .plotTileQualityScore, the columns are 'type', 'tile', 'lane', and
> 'score', where 'score' is the median 'qualityScore'
> (alphabetScore(quality(srq)) / width(quality(srq)) for some ShortReadQ
> object srq obtained by readFastq) over all reads in the tile.
>
> Martin
>
>   
>>>  qafq <- qa("./Contam_Screening/Run703/","s_8_1_sequence.txt",
>>>       
>> type="fastq")
>>     
>>> qafq
>>>       
>> class: FastqQA(9)
>> QA elements (access with qa[["elt"]]):
>>  readCounts: data.frame(1 3)
>>  baseCalls: data.frame(1 5)
>>  readQualityScore: data.frame(512 4)
>>  baseQuality: data.frame(94 3)
>>  alignQuality: data.frame(1 3)
>>  frequentSequences: data.frame(50 4)
>>  sequenceDistribution: data.frame(1663 4)
>>  perCycle: list(2)
>>    baseCall: data.frame(150 4)
>>    quality: data.frame(1081 5)
>>  perTile: list(2)
>>    readCounts: data.frame(0 4)
>>    medianReadQualityScore: data.frame(0 4)
>>
>> Thank You,
>> Sirisha
>>
>>     
>>> sessionInfo()
>>>       
>> R version 2.11.0 Under development (unstable) (2010-03-07 r51225)
>> x86_64-unknown-linux-gnu
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base   
>> other attached packages:
>> [1] ShortRead_1.5.21    lattice_0.18-3      Biostrings_2.15.22
>> [4] GenomicRanges_0.1.0 IRanges_1.5.74      Rmpi_0.5-8       
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.7.5 grid_2.11.0   hwriter_1.2   tools_2.11.0
>>
>> _______________________________________________
>> Bioc-sig-sequencing mailing list
>> Bioc-sig-sequencing at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>     
>
>
>